Regression Analysis¶

In [241]:
import pandas as pd
import numpy as np
import altair as alt
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

Customer data¶

In [242]:
customer_df = pd.read_csv('final/customers.csv', sep='|')
In [243]:
customer_df.head()
Out[243]:
ssn cc_num first last gender street city state zip lat long city_pop job dob acct_num
0 115-04-4507 4218196001337 Jonathan Johnson M 863 Lawrence Valleys Ambler PA 19002 40.1809 -75.2156 32412 Accounting technician 1959-10-03 888022315787
1 715-55-5575 4351161559407816183 Elaine Fuller F 310 Kendra Common Apt. 164 Leland NC 28451 34.2680 -78.0578 27112 Professor Emeritus 1963-06-07 917558277935
2 167-48-5821 4192832764832 Michael Cameron M 05641 Robin Port Cordova SC 29039 33.4275 -80.8857 4215 International aid/development worker 1973-05-30 718172762479
3 406-83-7518 4238849696532874 Brandon Williams M 26916 Carlson Mountain Birmingham AL 35242 33.3813 -86.7046 493806 Seismic interpreter 1942-12-26 947268892251
4 697-93-1877 4514627048281480 Lisa Hernandez F 809 Burns Creek Fargo GA 31631 30.7166 -82.5801 559 Medical laboratory scientific officer 1939-05-22 888335239225
In [244]:
customer_df['acct_num'].nunique()
Out[244]:
1000
In [245]:
customer_df.shape
Out[245]:
(1000, 15)
In [246]:
customer_df.columns
Out[246]:
Index(['ssn', 'cc_num', 'first', 'last', 'gender', 'street', 'city', 'state',
       'zip', 'lat', 'long', 'city_pop', 'job', 'dob', 'acct_num'],
      dtype='object')

Transaction data¶

In [247]:
import os

directory = './final'
dfs = []

for filename in os.listdir(directory):
    if filename.startswith('transactions'):
        print(f'Reading {filename}...')
        filepath = os.path.join(directory, filename)
        df = pd.read_csv(filepath, sep='|')
        dfs.append(df)
    else:
        print(f'{filename} is not a CSV file. Skipping...')

if len(dfs) > 0:
    transaction_df = pd.concat(dfs, ignore_index=True)
    print(f'Successfully merged {len(dfs)} dataframes into one.')
else:
    print('No CSV files found in directory.')
Reading transactions_12.csv...
Reading transactions_126.csv...
Reading transactions_127.csv...
Reading transactions_13.csv...
customers.csv is not a CSV file. Skipping...
Reading transactions_39.csv...
Reading transactions_11.csv...
Reading transactions_119.csv...
Reading transactions_131.csv...
Reading transactions_125.csv...
Reading transactions_124.csv...
Reading transactions_130.csv...
Reading transactions_118.csv...
Reading transactions_10.csv...
Reading transactions_38.csv...
Reading transactions_14.csv...
Reading transactions_28.csv...
Reading transactions_120.csv...
Reading transactions_108.csv...
Reading transactions_109.csv...
Reading transactions_121.csv...
Reading transactions_29.csv...
Reading transactions_15.csv...
Reading transactions_17.csv...
Reading transactions_123.csv...
Reading transactions_122.csv...
Reading transactions_16.csv...
Reading transactions_59.csv...
Reading transactions_65.csv...
Reading transactions_71.csv...
Reading transactions_7.csv...
Reading transactions_6.csv...
Reading transactions_70.csv...
Reading transactions_64.csv...
Reading transactions_58.csv...
Reading transactions_72.csv...
Reading transactions_4.csv...
Reading transactions_66.csv...
Reading transactions_99.csv...
Reading transactions_98.csv...
Reading transactions_67.csv...
Reading transactions_5.csv...
Reading transactions_73.csv...
Reading transactions_77.csv...
Reading transactions_1.csv...
Reading transactions_63.csv...
Reading transactions_88.csv...
Reading transactions_89.csv...
Reading transactions_62.csv...
Reading transactions_0.csv...
Reading transactions_76.csv...
Reading transactions_60.csv...
Reading transactions_74.csv...
Reading transactions_2.csv...
Reading transactions_48.csv...
Reading transactions_49.csv...
Reading transactions_3.csv...
Reading transactions_75.csv...
Reading transactions_61.csv...
Reading transactions_78.csv...
Reading transactions_44.csv...
Reading transactions_50.csv...
Reading transactions_87.csv...
Reading transactions_93.csv...
Reading transactions_92.csv...
Reading transactions_86.csv...
Reading transactions_51.csv...
Reading transactions_45.csv...
Reading transactions_79.csv...
Reading transactions_53.csv...
Reading transactions_47.csv...
Reading transactions_90.csv...
Reading transactions_84.csv...
Reading transactions_85.csv...
Reading transactions_91.csv...
Reading transactions_46.csv...
Reading transactions_52.csv...
Reading transactions_56.csv...
Reading transactions_42.csv...
Reading transactions_8.csv...
Reading transactions_95.csv...
Reading transactions_81.csv...
Reading transactions_80.csv...
Reading transactions_94.csv...
Reading transactions_9.csv...
Reading transactions_43.csv...
Reading transactions_57.csv...
Reading transactions_41.csv...
Reading transactions_55.csv...
Reading transactions_69.csv...
Reading transactions_82.csv...
Reading transactions_96.csv...
Reading transactions_97.csv...
Reading transactions_83.csv...
Reading transactions_68.csv...
Reading transactions_54.csv...
Reading transactions_40.csv...
Reading transactions_27.csv...
Reading transactions_33.csv...
Reading transactions_107.csv...
Reading transactions_113.csv...
Reading transactions_112.csv...
Reading transactions_106.csv...
Reading transactions_32.csv...
Reading transactions_26.csv...
Reading transactions_18.csv...
Reading transactions_30.csv...
Reading transactions_24.csv...
Reading transactions_110.csv...
Reading transactions_104.csv...
Reading transactions_105.csv...
Reading transactions_111.csv...
Reading transactions_25.csv...
Reading transactions_31.csv...
Reading transactions_19.csv...
Reading transactions_35.csv...
Reading transactions_21.csv...
Reading transactions_115.csv...
Reading transactions_101.csv...
Reading transactions_129.csv...
Reading transactions_128.csv...
Reading transactions_100.csv...
Reading transactions_114.csv...
Reading transactions_20.csv...
Reading transactions_34.csv...
Reading transactions_22.csv...
Reading transactions_36.csv...
Reading transactions_102.csv...
Reading transactions_116.csv...
Reading transactions_117.csv...
Reading transactions_103.csv...
Reading transactions_37.csv...
Reading transactions_23.csv...
Successfully merged 132 dataframes into one.
In [248]:
transaction_df.head()
Out[248]:
cc_num acct_num trans_num unix_time category amt is_fraud merchant merch_lat merch_long
0 4896331812335761701 149852234418 f3ec0819590302134f03ffdc2f44697f 1646060228 gas_transport 65.17 0 Larson, Ryan and Huang 38.143430 -90.327335
1 4896331812335761701 149852234418 c1607c993e41f2c3b42d72d1506bef7b 1644848624 gas_transport 47.58 0 Myers-Reed 39.119498 -90.760379
2 4896331812335761701 149852234418 6f530db25d20fe351249a54491fd3fde 1645632153 gas_transport 64.43 0 Baker-Bullock 39.384368 -90.361517
3 4896331812335761701 149852234418 6d11805f2acd938fec99376001afafe8 1645311286 gas_transport 82.47 0 Spencer-Hall 39.443567 -89.752400
4 4896331812335761701 149852234418 605342f297c575cb1ccf2c08cad082ee 1641571926 gas_transport 50.28 0 King, Rodriguez and Hancock 38.857278 -89.609525
In [249]:
transaction_df.shape
Out[249]:
(4260904, 10)
In [250]:
transaction_df.columns
Out[250]:
Index(['cc_num', 'acct_num', 'trans_num', 'unix_time', 'category', 'amt',
       'is_fraud', 'merchant', 'merch_lat', 'merch_long'],
      dtype='object')
In [251]:
transaction_df['acct_num'].nunique()
Out[251]:
983

There were 1,000 unique cusomters in customer_df, while there were 983 unique customers in transaction_df

In [252]:
df = customer_df.merge(transaction_df, on=['cc_num', 'acct_num'])
In [253]:
df.head()
Out[253]:
ssn cc_num first last gender street city state zip lat long city_pop job dob acct_num trans_num unix_time category amt is_fraud merchant merch_lat merch_long
0 115-04-4507 4218196001337 Jonathan Johnson M 863 Lawrence Valleys Ambler PA 19002 40.1809 -75.2156 32412 Accounting technician 1959-10-03 888022315787 91ab12e73ef38206e1121e9648d2408d 1558719550 gas_transport 69.12 0 Phillips Group 39.491416 -75.588522
1 115-04-4507 4218196001337 Jonathan Johnson M 863 Lawrence Valleys Ambler PA 19002 40.1809 -75.2156 32412 Accounting technician 1959-10-03 888022315787 071553d533a6822a4431c354c434ddcb 1569425519 grocery_pos 68.11 0 Tucker Ltd 40.890319 -75.573359
2 115-04-4507 4218196001337 Jonathan Johnson M 863 Lawrence Valleys Ambler PA 19002 40.1809 -75.2156 32412 Accounting technician 1959-10-03 888022315787 0cfad38ef15e4749eff68dc83f62c151 1577205601 misc_net 40.35 0 Dixon PLC 39.244958 -74.475327
3 115-04-4507 4218196001337 Jonathan Johnson M 863 Lawrence Valleys Ambler PA 19002 40.1809 -75.2156 32412 Accounting technician 1959-10-03 888022315787 5782693d7c70f062f258cb30bfa8900f 1571428238 grocery_pos 96.22 0 Lambert-Cooper 39.656925 -75.802342
4 115-04-4507 4218196001337 Jonathan Johnson M 863 Lawrence Valleys Ambler PA 19002 40.1809 -75.2156 32412 Accounting technician 1959-10-03 888022315787 35fd7db657d7e30dd608c37f7798186e 1549840400 gas_transport 71.89 0 Griffith LLC 40.313342 -74.220434
In [254]:
df['acct_num'].nunique()
Out[254]:
983
In [255]:
df.shape
Out[255]:
(4260904, 23)
In [256]:
df.columns
Out[256]:
Index(['ssn', 'cc_num', 'first', 'last', 'gender', 'street', 'city', 'state',
       'zip', 'lat', 'long', 'city_pop', 'job', 'dob', 'acct_num', 'trans_num',
       'unix_time', 'category', 'amt', 'is_fraud', 'merchant', 'merch_lat',
       'merch_long'],
      dtype='object')
In [257]:
df.isna().sum()
Out[257]:
ssn           0
cc_num        0
first         0
last          0
gender        0
street        0
city          0
state         0
zip           0
lat           0
long          0
city_pop      0
job           0
dob           0
acct_num      0
trans_num     0
unix_time     0
category      0
amt           0
is_fraud      0
merchant      0
merch_lat     0
merch_long    0
dtype: int64
In [258]:
df.duplicated().sum()
Out[258]:
0
In [259]:
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 4260904 entries, 0 to 4260903
Data columns (total 23 columns):
 #   Column      Dtype  
---  ------      -----  
 0   ssn         object 
 1   cc_num      object 
 2   first       object 
 3   last        object 
 4   gender      object 
 5   street      object 
 6   city        object 
 7   state       object 
 8   zip         int64  
 9   lat         float64
 10  long        float64
 11  city_pop    int64  
 12  job         object 
 13  dob         object 
 14  acct_num    object 
 15  trans_num   object 
 16  unix_time   object 
 17  category    object 
 18  amt         float64
 19  is_fraud    object 
 20  merchant    object 
 21  merch_lat   float64
 22  merch_long  float64
dtypes: float64(5), int64(2), object(16)
memory usage: 780.2+ MB

Data Dictionary¶

  • cc_num: The credit card number used in the transaction
  • first: The first name of credit card holder
  • last: The last name of credit card holder
  • gender: The gender of the credit card holder
  • street: The street address of the credit card holder
  • city: The city of the credit card holder
  • state: The state of the credit card holder
  • zip: The zip code of the credit card holder
  • lat: The latitude of the credit card holder
  • long: The longitude of the credit card holder
  • city_pop: The population of the city where the customer live
  • job: The occupation of the credit card holder
  • dob: The date of birth of the credit card holder
  • acct_num: The account number of the credit card holder
  • trans_num: The transaction number of the credit card holder
  • unix_time: The transaction time in Unix time
  • amt: The transaction amount
  • is_fraud: Whether the transaction is fraudulent (1) or not (0)
  • merchant: The merchant where the transaction took place
  • merch_lat: The latitude of merchant
  • merch_long: The longitude of merchant
In [260]:
df['unix_time'].nunique()
Out[260]:
4114752
In [261]:
df_cleaned = df.copy()

In the data preparation stage for regression analysis, we will undertake specific steps to ensure the dataset is appropriately prepared for the business case of predicting the next month’s spending. The following actions will be performed:

Conversion of Unix Time: The Unix Time values will be transformed into a more interpretable format called ‘trasn_month_year’. This conversion will provide the transaction month and year information, allowing us to analyze trends over time and establishes a chronological framework for predicting future spending.

Age Calculations: By utilizing the date of birth information (‘dob’), we will be able to calculate the age of each customer. Age can be a relevant factor affecting spending behaviour, and we will categorize the age variables into seven bins: <18, 18-24, 25-34, 35-44, 45-54, 55-64, and 65+, which will enable us to evaluate the impact of age on future transactions predictions.

Filtering out Fraudulent Transactions: To ensure the accuracy and reliability of our regression analysis, we will filter out fraudulent transactions from the dataset. By excluding the fraudulent transactions, we focus solely on the legitimate customers' spending patterns, which are crucial for predicting the next month’s spending.

Grouping and Aggregating Data: The dataset will be grouped by the 'trans_month_year' variable to examine monthly spending patterns. Transaction data will be aggregated by calculating the sum, mean, maximum, minimum, and count of transactions for each month. Additionally, the transaction data will be flattened to include separate columns for the total and number of transactions in each month. These aggregated statistics will provide valuable insights into customer spending behaviour over time and facilitate accurate predictions for the next month's spending.

Inclusion of Customer Demographics: To enrich the dataset, we will incorporate customer demographics such as gender, age, job, city, and age_grouped. These variables will provide additional insight into the factor that influence spending behaviour and enhance the accuracy of our regression model.

In [262]:
df_cleaned['dob'] = pd.to_datetime(df_cleaned['dob'])

Convert unix_time to trans_month_year, and quarter¶

In [263]:
df_cleaned['trans_date_time'] = pd.to_datetime(df_cleaned['unix_time'], unit='s')
In [264]:
df_cleaned['trans_month_year'] = pd.to_datetime(df_cleaned['trans_date_time']).dt.to_period('M')
df_cleaned['quarter'] = pd.to_datetime(df_cleaned['trans_date_time']).dt.to_period('Q')
In [265]:
df_cleaned['first_trans_month_year'] = df_cleaned.groupby('acct_num')['trans_month_year'].transform('min')
df_cleaned['first_trans_month_year'] = df_cleaned['first_trans_month_year'].dt.to_timestamp()
age = ((df_cleaned['first_trans_month_year'] - df_cleaned['dob']).dt.days / 365.25).apply(round)
df_cleaned['age'] = age
In [266]:
df_cleaned.head()
Out[266]:
ssn cc_num first last gender street city state zip lat long city_pop job dob acct_num trans_num unix_time category amt is_fraud merchant merch_lat merch_long trans_date_time trans_month_year quarter first_trans_month_year age
0 115-04-4507 4218196001337 Jonathan Johnson M 863 Lawrence Valleys Ambler PA 19002 40.1809 -75.2156 32412 Accounting technician 1959-10-03 888022315787 91ab12e73ef38206e1121e9648d2408d 1558719550 gas_transport 69.12 0 Phillips Group 39.491416 -75.588522 2019-05-24 17:39:10 2019-05 2019Q2 2018-12-01 59
1 115-04-4507 4218196001337 Jonathan Johnson M 863 Lawrence Valleys Ambler PA 19002 40.1809 -75.2156 32412 Accounting technician 1959-10-03 888022315787 071553d533a6822a4431c354c434ddcb 1569425519 grocery_pos 68.11 0 Tucker Ltd 40.890319 -75.573359 2019-09-25 15:31:59 2019-09 2019Q3 2018-12-01 59
2 115-04-4507 4218196001337 Jonathan Johnson M 863 Lawrence Valleys Ambler PA 19002 40.1809 -75.2156 32412 Accounting technician 1959-10-03 888022315787 0cfad38ef15e4749eff68dc83f62c151 1577205601 misc_net 40.35 0 Dixon PLC 39.244958 -74.475327 2019-12-24 16:40:01 2019-12 2019Q4 2018-12-01 59
3 115-04-4507 4218196001337 Jonathan Johnson M 863 Lawrence Valleys Ambler PA 19002 40.1809 -75.2156 32412 Accounting technician 1959-10-03 888022315787 5782693d7c70f062f258cb30bfa8900f 1571428238 grocery_pos 96.22 0 Lambert-Cooper 39.656925 -75.802342 2019-10-18 19:50:38 2019-10 2019Q4 2018-12-01 59
4 115-04-4507 4218196001337 Jonathan Johnson M 863 Lawrence Valleys Ambler PA 19002 40.1809 -75.2156 32412 Accounting technician 1959-10-03 888022315787 35fd7db657d7e30dd608c37f7798186e 1549840400 gas_transport 71.89 0 Griffith LLC 40.313342 -74.220434 2019-02-10 23:13:20 2019-02 2019Q1 2018-12-01 59
In [267]:
import seaborn as sns
import matplotlib.pyplot as plt
palette='ch:.25'
In [268]:
pred_trans_df = df_cleaned.copy()

There are 14 unique category

In [269]:
pred_trans_df['category'].nunique()
Out[269]:
14

There are 939 unique zip code

In [270]:
pred_trans_df['zip'].nunique()
Out[270]:
939

There are 726 unique city

In [271]:
pred_trans_df['city'].nunique()
Out[271]:
726

There are 51 unique state

In [272]:
pred_trans_df['state'].nunique()
Out[272]:
51

There are 505 unique job

In [273]:
pred_trans_df['job'].nunique()
Out[273]:
505

We will filter only is_fraud = 0, since we don't want to have any frualent transaction¶

In [274]:
pred_trans_df = pred_trans_df[pred_trans_df['is_fraud'] == 0]
In [275]:
pred_trans_df.shape
Out[275]:
(4255870, 28)
In [276]:
pred_trans_df.columns
Out[276]:
Index(['ssn', 'cc_num', 'first', 'last', 'gender', 'street', 'city', 'state',
       'zip', 'lat', 'long', 'city_pop', 'job', 'dob', 'acct_num', 'trans_num',
       'unix_time', 'category', 'amt', 'is_fraud', 'merchant', 'merch_lat',
       'merch_long', 'trans_date_time', 'trans_month_year', 'quarter',
       'first_trans_month_year', 'age'],
      dtype='object')
In [277]:
pred_trans_df.head()
Out[277]:
ssn cc_num first last gender street city state zip lat long city_pop job dob acct_num trans_num unix_time category amt is_fraud merchant merch_lat merch_long trans_date_time trans_month_year quarter first_trans_month_year age
0 115-04-4507 4218196001337 Jonathan Johnson M 863 Lawrence Valleys Ambler PA 19002 40.1809 -75.2156 32412 Accounting technician 1959-10-03 888022315787 91ab12e73ef38206e1121e9648d2408d 1558719550 gas_transport 69.12 0 Phillips Group 39.491416 -75.588522 2019-05-24 17:39:10 2019-05 2019Q2 2018-12-01 59
1 115-04-4507 4218196001337 Jonathan Johnson M 863 Lawrence Valleys Ambler PA 19002 40.1809 -75.2156 32412 Accounting technician 1959-10-03 888022315787 071553d533a6822a4431c354c434ddcb 1569425519 grocery_pos 68.11 0 Tucker Ltd 40.890319 -75.573359 2019-09-25 15:31:59 2019-09 2019Q3 2018-12-01 59
2 115-04-4507 4218196001337 Jonathan Johnson M 863 Lawrence Valleys Ambler PA 19002 40.1809 -75.2156 32412 Accounting technician 1959-10-03 888022315787 0cfad38ef15e4749eff68dc83f62c151 1577205601 misc_net 40.35 0 Dixon PLC 39.244958 -74.475327 2019-12-24 16:40:01 2019-12 2019Q4 2018-12-01 59
3 115-04-4507 4218196001337 Jonathan Johnson M 863 Lawrence Valleys Ambler PA 19002 40.1809 -75.2156 32412 Accounting technician 1959-10-03 888022315787 5782693d7c70f062f258cb30bfa8900f 1571428238 grocery_pos 96.22 0 Lambert-Cooper 39.656925 -75.802342 2019-10-18 19:50:38 2019-10 2019Q4 2018-12-01 59
4 115-04-4507 4218196001337 Jonathan Johnson M 863 Lawrence Valleys Ambler PA 19002 40.1809 -75.2156 32412 Accounting technician 1959-10-03 888022315787 35fd7db657d7e30dd608c37f7798186e 1549840400 gas_transport 71.89 0 Griffith LLC 40.313342 -74.220434 2019-02-10 23:13:20 2019-02 2019Q1 2018-12-01 59
In [278]:
pred_trans_df['month'] = pred_trans_df['trans_month_year'].dt.month
pred_trans_df['year'] = pred_trans_df['trans_month_year'].dt.year

Monthly Spending¶

In [279]:
monthly_spending = pred_trans_df.groupby(['year', 'month']).agg({'amt': ['sum', 'mean', 'max', 'min'],
                                                                 'trans_num': 'count'}).reset_index()
monthly_spending.columns = ['year', 'month', 'total_amt', 'mean_amt', 'max_amt', 'min_amt', 'trans_count']
In [280]:
monthly_spending.tail()
Out[280]:
year month total_amt mean_amt max_amt min_amt trans_count
44 2022 8 9810985.07 63.096803 24583.86 1.0 155491
45 2022 9 7837465.56 62.719293 16460.30 1.0 124961
46 2022 10 8217907.95 62.749845 25159.92 1.0 130963
47 2022 11 8046118.86 64.152884 23235.32 1.0 125421
48 2022 12 16418548.14 63.093177 23949.46 1.0 260227
In [281]:
sns.lineplot(data=monthly_spending, x='month', y='total_amt', hue='year', palette=palette)
plt.ylabel('Total Monthly Spending ($)')
plt.ticklabel_format(style='plain', axis='y')
plt.yticks(np.arange(0, 1.8e7, 0.2e7))
plt.title('Total Monthly Spending Across Customers')

plt.show()

It seems to be a seasonal trend of spending. The data shows a yearly cylce of higher spending during the months of November, December, and March. It's also evident that the spending in December is much higher than the other months. There could be many reasons behind, such as holidays, sales, or promotions.

In [282]:
monthly_spending['year_month'] = monthly_spending['year'].astype(str) + '-' + monthly_spending['month'].astype(str)
In [283]:
fig, ax = plt.subplots(figsize=(18, 6))

ax.plot(monthly_spending['year_month'], monthly_spending['total_amt'], color='black')
ax.set_xlabel('Year-Month')

ax.set_xlabel('Year-Month', fontsize=12)
ax.tick_params(axis='x', rotation=45)
ax.set_ylabel('Total Spending ($)', fontsize=12)
ax.yaxis.set_major_formatter('${:.0f}'.format)

ax.set_title('Total Monthly Spending', fontsize=14)
Out[283]:
Text(0.5, 1.0, 'Total Monthly Spending')
In [284]:
fig, ax = plt.subplots(figsize=(18, 6))
ax2 = ax.twinx()

ax.plot(monthly_spending['year_month'], monthly_spending['total_amt'], color='black')
ax.set_xlabel('Year-Month', fontsize=12)
ax.tick_params(axis='x', rotation=45)
ax.set_ylabel('Total Spending ($)', fontsize=12)
ax.yaxis.set_major_formatter('${:.0f}'.format)
ax.set_title('Total Monthly Spending and Transactions', fontsize=14)

ax2.bar(monthly_spending['year_month'], monthly_spending['trans_count'], color='grey', alpha=0.5)
ax2.set_ylabel('Transaction Count', fontsize=12)

plt.show()
In [285]:
fig, ax = plt.subplots(figsize=(18, 6))

ax.plot(monthly_spending['year_month'], monthly_spending['total_amt'], color='black')

ax.axvspan('2019-11', '2020-1', color='red', alpha=0.1)
ax.axvspan('2020-11', '2021-1', color='red', alpha=0.1)
ax.axvspan('2021-11', '2022-1', color='red', alpha=0.1)
ax.axvspan('2022-11', '2022-12', color='red', alpha=0.1)

ax.set_xlabel('Year-Month', fontsize=12)
ax.tick_params(axis='x', rotation=45)
ax.set_ylabel('Total Spending ($)', fontsize=12)
ax.yaxis.set_major_formatter('${:.0f}'.format)

ax.set_title('Total Monthly Spending', fontsize=14)

plt.show()
In [286]:
fig, ax = plt.subplots(figsize=(18, 6))

ax.plot(monthly_spending['year_month'], monthly_spending['total_amt'], color='black')

ax2 = ax.twinx()
ax2.bar(monthly_spending['year_month'], monthly_spending['trans_count'], color='grey', alpha=0.5)
ax2.set_ylabel('Total Transactions', fontsize=12)
ax2.yaxis.set_major_formatter('{:.0f}'.format)

ax.axvspan('2019-11', '2020-1', color='red', alpha=0.1)
ax.axvspan('2020-11', '2021-1', color='red', alpha=0.1)
ax.axvspan('2021-11', '2022-1', color='red', alpha=0.1)
ax.axvspan('2022-11', '2022-12', color='red', alpha=0.1)

ax.set_xlabel('Year-Month', fontsize=12)
ax.tick_params(axis='x', rotation=45)
ax.set_ylabel('Total Spending ($)', fontsize=12)
ax.yaxis.set_major_formatter('${:.0f}'.format)

ax.legend(['Total Spending'], loc='upper left')
ax2.legend(['Total Transactions'], loc='upper right')
plt.title('Total Monthly Spending and Transactions', fontsize=14)

plt.show()

We will group the data by acct_num, and trans_month_year and find the total amount of spending per month, total number of transaction, and the most frequent category.¶

In [287]:
total_spent_per_month = pred_trans_df.groupby(['acct_num', 'trans_month_year']).agg({
    'category': lambda x: x.value_counts().idxmax(),
    'age': 'min',
    'gender': 'first',
    'job': 'first',
    'city': 'first',
    'state': 'first',
    'zip': 'first',
    'trans_num': 'count',
    'amt': ['mean', 'max', 'min', 'sum']
}).reset_index()
In [288]:
total_spent_per_month.columns = ['acct_num', 'trans_month_year', 'category', 'age', 'gender', 'job', 'city', 'state', 'zip', 'trans_count', 'mean_amt', 'max_amt', 'min_amt', 'total_amt']
In [289]:
total_spent_per_month.head()
Out[289]:
acct_num trans_month_year category age gender job city state zip trans_count mean_amt max_amt min_amt total_amt
0 2348758451 2018-12 gas_transport 42 M Surveyor, minerals Rochester NY 14621 1 96.050000 96.05 96.05 96.05
1 2348758451 2019-01 gas_transport 42 M Surveyor, minerals Rochester NY 14621 40 78.377750 359.87 5.67 3135.11
2 2348758451 2019-02 gas_transport 42 M Surveyor, minerals Rochester NY 14621 49 62.837143 245.29 4.41 3079.02
3 2348758451 2019-03 gas_transport 42 M Surveyor, minerals Rochester NY 14621 57 54.632456 131.91 1.73 3114.05
4 2348758451 2019-04 gas_transport 42 M Surveyor, minerals Rochester NY 14621 60 69.893500 1183.46 1.27 4193.61
In [290]:
fig, axes = plt.subplots(6, 1, figsize=(16, 18))

for i, c in enumerate(['total_amt', 'mean_amt', 'trans_count', 'age', 'gender', 'state']):
    sns.histplot(data=total_spent_per_month[c], ax=axes[i], kde=True, color='orange')

plt.tight_layout()
plt.show()
In [291]:
customer_summary = pd.pivot_table(total_spent_per_month, 
                                  index=['acct_num'], 
                                  columns=['trans_month_year'], 
                                  values=['trans_count', 'total_amt'],
                                  fill_value=0)
In [292]:
customer_summary_flat = pd.DataFrame(customer_summary.to_records())

customer_summary_flat.columns = [col.replace("('", "").replace("', '", "_").replace("'))", "") 
                                 for col in customer_summary_flat.columns]
In [293]:
customer_summary_flat.head()
Out[293]:
acct_num total_amt', Period2018-12_M total_amt', Period2019-01_M total_amt', Period2019-02_M total_amt', Period2019-03_M total_amt', Period2019-04_M total_amt', Period2019-05_M total_amt', Period2019-06_M total_amt', Period2019-07_M total_amt', Period2019-08_M total_amt', Period2019-09_M total_amt', Period2019-10_M total_amt', Period2019-11_M total_amt', Period2019-12_M total_amt', Period2020-01_M total_amt', Period2020-02_M total_amt', Period2020-03_M total_amt', Period2020-04_M total_amt', Period2020-05_M total_amt', Period2020-06_M total_amt', Period2020-07_M total_amt', Period2020-08_M total_amt', Period2020-09_M total_amt', Period2020-10_M total_amt', Period2020-11_M total_amt', Period2020-12_M total_amt', Period2021-01_M total_amt', Period2021-02_M total_amt', Period2021-03_M total_amt', Period2021-04_M total_amt', Period2021-05_M total_amt', Period2021-06_M total_amt', Period2021-07_M total_amt', Period2021-08_M total_amt', Period2021-09_M total_amt', Period2021-10_M total_amt', Period2021-11_M total_amt', Period2021-12_M total_amt', Period2022-01_M total_amt', Period2022-02_M total_amt', Period2022-03_M total_amt', Period2022-04_M total_amt', Period2022-05_M total_amt', Period2022-06_M total_amt', Period2022-07_M total_amt', Period2022-08_M total_amt', Period2022-09_M total_amt', Period2022-10_M total_amt', Period2022-11_M total_amt', Period2022-12_M trans_count', Period2018-12_M trans_count', Period2019-01_M trans_count', Period2019-02_M trans_count', Period2019-03_M trans_count', Period2019-04_M trans_count', Period2019-05_M trans_count', Period2019-06_M trans_count', Period2019-07_M trans_count', Period2019-08_M trans_count', Period2019-09_M trans_count', Period2019-10_M trans_count', Period2019-11_M trans_count', Period2019-12_M trans_count', Period2020-01_M trans_count', Period2020-02_M trans_count', Period2020-03_M trans_count', Period2020-04_M trans_count', Period2020-05_M trans_count', Period2020-06_M trans_count', Period2020-07_M trans_count', Period2020-08_M trans_count', Period2020-09_M trans_count', Period2020-10_M trans_count', Period2020-11_M trans_count', Period2020-12_M trans_count', Period2021-01_M trans_count', Period2021-02_M trans_count', Period2021-03_M trans_count', Period2021-04_M trans_count', Period2021-05_M trans_count', Period2021-06_M trans_count', Period2021-07_M trans_count', Period2021-08_M trans_count', Period2021-09_M trans_count', Period2021-10_M trans_count', Period2021-11_M trans_count', Period2021-12_M trans_count', Period2022-01_M trans_count', Period2022-02_M trans_count', Period2022-03_M trans_count', Period2022-04_M trans_count', Period2022-05_M trans_count', Period2022-06_M trans_count', Period2022-07_M trans_count', Period2022-08_M trans_count', Period2022-09_M trans_count', Period2022-10_M trans_count', Period2022-11_M trans_count', Period2022-12_M
0 2348758451 96.05 3135.11 3079.02 3114.05 4193.61 4290.73 3657.02 3765.51 4004.26 2416.59 3962.87 3124.60 6008.22 3077.01 2998.80 3586.09 2436.95 3604.75 4707.65 2372.02 3111.07 6454.39 1679.22 2026.95 6503.73 1753.81 1483.12 4791.39 1432.27 2754.88 2991.87 3684.23 2806.12 2531.25 1559.79 2112.00 5953.41 5345.96 10263.91 14583.42 7094.84 7880.77 7649.20 9317.50 6850.65 6256.74 6588.37 8882.73 13408.74 1 40 49 57 60 75 67 70 68 51 57 45 102 44 53 56 44 54 71 56 66 45 51 50 118 46 34 50 47 50 71 76 62 54 42 51 119 97 104 150 136 144 160 190 159 135 152 127 285
1 2468061102 5.75 4793.59 2950.34 5128.18 5516.67 5324.35 7142.46 6812.52 6650.47 4145.92 5302.92 6493.92 11419.12 4553.61 3378.39 3979.67 5257.87 6132.76 6587.81 5275.80 5193.65 6320.92 4682.45 4766.77 9564.12 3048.51 2648.57 5633.02 7646.09 5595.21 4295.62 6773.62 4391.10 3689.29 4440.16 4611.52 10070.20 7917.40 5189.47 11352.79 9277.44 9837.36 13240.34 12826.66 11121.19 10107.50 11085.32 12425.87 18979.94 2 71 53 82 77 98 114 115 106 83 74 90 158 58 60 58 86 112 110 94 97 82 80 63 155 62 52 78 89 101 86 116 80 59 82 88 181 130 100 174 151 180 217 201 209 179 191 160 310
2 3005591724 9.47 1428.09 2065.59 2644.69 1240.21 2920.90 3629.99 2437.47 2174.59 706.39 1851.48 4132.25 3117.27 908.63 1059.99 2246.52 1612.55 1736.92 2107.86 1925.91 1558.20 1941.24 1335.45 1638.87 2912.18 2165.87 1343.19 1288.28 1336.99 2009.25 2490.44 2196.33 1910.26 970.84 5580.45 1424.08 3368.60 3404.76 4504.15 8251.73 6499.15 6581.49 6828.38 8442.24 8530.51 6109.92 6252.24 4902.91 11494.04 1 23 21 30 25 34 31 39 38 14 16 33 56 16 20 27 26 35 38 38 26 21 25 17 47 27 22 30 19 38 32 34 34 21 41 28 67 74 69 107 108 102 124 159 142 122 119 110 208
3 3418322859 0.00 5829.97 4646.60 6703.80 5745.56 6947.46 7834.35 8241.81 8215.51 6083.78 6130.85 4561.81 10911.14 5669.65 2633.59 3986.01 5264.24 5054.47 7329.56 5746.32 5304.16 7184.48 6439.11 3028.10 12515.35 5531.93 1924.89 5326.26 4138.59 4724.03 7412.62 7566.54 8705.06 4191.34 5032.46 3268.44 14907.34 4216.82 2783.59 5663.01 9328.76 7270.09 13990.58 7839.01 7342.55 6141.45 7249.93 5236.27 16095.44 0 97 80 114 103 120 133 140 144 105 107 112 212 83 66 90 115 142 130 144 134 123 107 87 208 93 66 113 107 123 132 146 145 111 136 92 229 87 86 141 159 150 162 182 144 127 136 149 286
4 4322238535 0.00 1369.44 1250.73 1634.28 2678.31 2658.26 2187.63 1445.66 4178.49 3196.41 1348.05 1565.03 5342.98 2539.69 918.26 1416.10 2397.57 1955.85 2892.05 1755.64 2013.18 1357.14 1609.49 1648.21 5087.10 737.62 1031.34 1175.36 981.97 1497.06 2837.26 2479.07 1076.47 1820.65 1609.43 2051.37 2576.63 1672.36 2434.82 3356.19 3257.14 4866.14 4084.15 3371.14 3609.39 4716.26 3044.20 3551.07 7364.31 0 20 19 23 44 28 37 29 36 25 18 30 53 23 14 33 34 32 35 30 27 25 30 28 44 18 20 17 23 28 37 34 32 29 34 26 57 31 37 54 53 65 64 73 66 72 55 56 120
In [294]:
customer_summary_flat = customer_summary_flat.rename(columns={"total_amt', Period2022-12_M": "Target"})
In [295]:
fig, axes = plt.subplots(8, 6, figsize=(25, 25))
axes = [ax for axes_row in axes for ax in axes_row]

for i, c in enumerate(customer_summary_flat.loc[:, "total_amt', Period2018-12_M":"total_amt', Period2022-11_M"]):
    sns.scatterplot(y="Target", x=c, data=customer_summary_flat, ax=axes[i], color='orange')
    sns.regplot(y="Target", x=c, data=customer_summary_flat, ax=axes[i], scatter=False, color='red')
    corr = customer_summary_flat['Target'].corr(customer_summary_flat[c])
    axes[i].set_title('Corr: {:.2f}'.format(corr), fontsize=12)

plt.tight_layout()
plt.show()
In [296]:
fig, axes = plt.subplots(8, 6, figsize=(25, 25))
axes = [ax for axes_row in axes for ax in axes_row]

for i, c in enumerate(customer_summary_flat.loc[:, "total_amt', Period2018-12_M":"total_amt', Period2022-11_M"]):
    sns.boxplot(x=c, data=customer_summary_flat, ax=axes[i], color='orange', flierprops=dict(markerfacecolor='red'))
    
plt.tight_layout()
plt.show()
In [297]:
fig, axes = plt.subplots(7, 7, figsize=(25, 25))
axes = [ax for axes_row in axes for ax in axes_row]

for i, c in enumerate(customer_summary_flat.loc[:, "trans_count', Period2018-12_M":"trans_count', Period2022-12_M"]):
    sns.scatterplot(y="Target", x=c, data=customer_summary_flat, ax=axes[i], color='orange')
    sns.regplot(y="Target", x=c, data=customer_summary_flat, ax=axes[i], scatter=False, color='red')
    corr = customer_summary_flat['Target'].corr(customer_summary_flat[c])
    axes[i].set_title('Corr: {:.2f}'.format(corr), fontsize=12)

plt.tight_layout()
plt.show()
In [298]:
fig, axes = plt.subplots(7, 7, figsize=(25, 25))
axes = [ax for axes_row in axes for ax in axes_row]

for i, c in enumerate(customer_summary_flat.loc[:, "trans_count', Period2018-12_M":"trans_count', Period2022-12_M"]):
    sns.boxplot(x=c, data=customer_summary_flat, ax=axes[i], color='orange', flierprops=dict(markerfacecolor='red'))

plt.tight_layout()
plt.show()
In [299]:
customer_demographics = pred_trans_df[['acct_num', 'gender', 'age', 'job', 'city']]
In [300]:
trans_df = pd.merge(customer_summary_flat, customer_demographics, on='acct_num')
In [301]:
trans_df.drop_duplicates(inplace=True)
In [302]:
trans_df.reset_index(drop=True, inplace=True)
In [303]:
bins = [0, 18, 24, 34, 44, 54, 64, 200]
labels = ['<18', '18-24', '25-34', '35-44', '45-54', '55-64', '65+']
In [304]:
trans_df['age_group'] = pd.cut(trans_df['age'], bins=bins, labels=labels)
In [305]:
target_index = trans_df.columns.get_loc("Target")

cols = list(trans_df.columns)
cols.append(cols.pop(target_index))
trans_df = trans_df[cols]
In [306]:
trans_df.head()
Out[306]:
acct_num total_amt', Period2018-12_M total_amt', Period2019-01_M total_amt', Period2019-02_M total_amt', Period2019-03_M total_amt', Period2019-04_M total_amt', Period2019-05_M total_amt', Period2019-06_M total_amt', Period2019-07_M total_amt', Period2019-08_M total_amt', Period2019-09_M total_amt', Period2019-10_M total_amt', Period2019-11_M total_amt', Period2019-12_M total_amt', Period2020-01_M total_amt', Period2020-02_M total_amt', Period2020-03_M total_amt', Period2020-04_M total_amt', Period2020-05_M total_amt', Period2020-06_M total_amt', Period2020-07_M total_amt', Period2020-08_M total_amt', Period2020-09_M total_amt', Period2020-10_M total_amt', Period2020-11_M total_amt', Period2020-12_M total_amt', Period2021-01_M total_amt', Period2021-02_M total_amt', Period2021-03_M total_amt', Period2021-04_M total_amt', Period2021-05_M total_amt', Period2021-06_M total_amt', Period2021-07_M total_amt', Period2021-08_M total_amt', Period2021-09_M total_amt', Period2021-10_M total_amt', Period2021-11_M total_amt', Period2021-12_M total_amt', Period2022-01_M total_amt', Period2022-02_M total_amt', Period2022-03_M total_amt', Period2022-04_M total_amt', Period2022-05_M total_amt', Period2022-06_M total_amt', Period2022-07_M total_amt', Period2022-08_M total_amt', Period2022-09_M total_amt', Period2022-10_M total_amt', Period2022-11_M trans_count', Period2018-12_M trans_count', Period2019-01_M trans_count', Period2019-02_M trans_count', Period2019-03_M trans_count', Period2019-04_M trans_count', Period2019-05_M trans_count', Period2019-06_M trans_count', Period2019-07_M trans_count', Period2019-08_M trans_count', Period2019-09_M trans_count', Period2019-10_M trans_count', Period2019-11_M trans_count', Period2019-12_M trans_count', Period2020-01_M trans_count', Period2020-02_M trans_count', Period2020-03_M trans_count', Period2020-04_M trans_count', Period2020-05_M trans_count', Period2020-06_M trans_count', Period2020-07_M trans_count', Period2020-08_M trans_count', Period2020-09_M trans_count', Period2020-10_M trans_count', Period2020-11_M trans_count', Period2020-12_M trans_count', Period2021-01_M trans_count', Period2021-02_M trans_count', Period2021-03_M trans_count', Period2021-04_M trans_count', Period2021-05_M trans_count', Period2021-06_M trans_count', Period2021-07_M trans_count', Period2021-08_M trans_count', Period2021-09_M trans_count', Period2021-10_M trans_count', Period2021-11_M trans_count', Period2021-12_M trans_count', Period2022-01_M trans_count', Period2022-02_M trans_count', Period2022-03_M trans_count', Period2022-04_M trans_count', Period2022-05_M trans_count', Period2022-06_M trans_count', Period2022-07_M trans_count', Period2022-08_M trans_count', Period2022-09_M trans_count', Period2022-10_M trans_count', Period2022-11_M trans_count', Period2022-12_M gender age job city age_group Target
0 2348758451 96.05 3135.11 3079.02 3114.05 4193.61 4290.73 3657.02 3765.51 4004.26 2416.59 3962.87 3124.60 6008.22 3077.01 2998.80 3586.09 2436.95 3604.75 4707.65 2372.02 3111.07 6454.39 1679.22 2026.95 6503.73 1753.81 1483.12 4791.39 1432.27 2754.88 2991.87 3684.23 2806.12 2531.25 1559.79 2112.00 5953.41 5345.96 10263.91 14583.42 7094.84 7880.77 7649.20 9317.50 6850.65 6256.74 6588.37 8882.73 1 40 49 57 60 75 67 70 68 51 57 45 102 44 53 56 44 54 71 56 66 45 51 50 118 46 34 50 47 50 71 76 62 54 42 51 119 97 104 150 136 144 160 190 159 135 152 127 285 M 42 Surveyor, minerals Rochester 35-44 13408.74
1 2468061102 5.75 4793.59 2950.34 5128.18 5516.67 5324.35 7142.46 6812.52 6650.47 4145.92 5302.92 6493.92 11419.12 4553.61 3378.39 3979.67 5257.87 6132.76 6587.81 5275.80 5193.65 6320.92 4682.45 4766.77 9564.12 3048.51 2648.57 5633.02 7646.09 5595.21 4295.62 6773.62 4391.10 3689.29 4440.16 4611.52 10070.20 7917.40 5189.47 11352.79 9277.44 9837.36 13240.34 12826.66 11121.19 10107.50 11085.32 12425.87 2 71 53 82 77 98 114 115 106 83 74 90 158 58 60 58 86 112 110 94 97 82 80 63 155 62 52 78 89 101 86 116 80 59 82 88 181 130 100 174 151 180 217 201 209 179 191 160 310 F 60 Nurse, adult Oceanside 55-64 18979.94
2 3005591724 9.47 1428.09 2065.59 2644.69 1240.21 2920.90 3629.99 2437.47 2174.59 706.39 1851.48 4132.25 3117.27 908.63 1059.99 2246.52 1612.55 1736.92 2107.86 1925.91 1558.20 1941.24 1335.45 1638.87 2912.18 2165.87 1343.19 1288.28 1336.99 2009.25 2490.44 2196.33 1910.26 970.84 5580.45 1424.08 3368.60 3404.76 4504.15 8251.73 6499.15 6581.49 6828.38 8442.24 8530.51 6109.92 6252.24 4902.91 1 23 21 30 25 34 31 39 38 14 16 33 56 16 20 27 26 35 38 38 26 21 25 17 47 27 22 30 19 38 32 34 34 21 41 28 67 74 69 107 108 102 124 159 142 122 119 110 208 F 73 Engineer, automotive Lancaster 65+ 11494.04
3 3418322859 0.00 5829.97 4646.60 6703.80 5745.56 6947.46 7834.35 8241.81 8215.51 6083.78 6130.85 4561.81 10911.14 5669.65 2633.59 3986.01 5264.24 5054.47 7329.56 5746.32 5304.16 7184.48 6439.11 3028.10 12515.35 5531.93 1924.89 5326.26 4138.59 4724.03 7412.62 7566.54 8705.06 4191.34 5032.46 3268.44 14907.34 4216.82 2783.59 5663.01 9328.76 7270.09 13990.58 7839.01 7342.55 6141.45 7249.93 5236.27 0 97 80 114 103 120 133 140 144 105 107 112 212 83 66 90 115 142 130 144 134 123 107 87 208 93 66 113 107 123 132 146 145 111 136 92 229 87 86 141 159 150 162 182 144 127 136 149 286 F 17 Operational investment banker Mountain View <18 16095.44
4 4322238535 0.00 1369.44 1250.73 1634.28 2678.31 2658.26 2187.63 1445.66 4178.49 3196.41 1348.05 1565.03 5342.98 2539.69 918.26 1416.10 2397.57 1955.85 2892.05 1755.64 2013.18 1357.14 1609.49 1648.21 5087.10 737.62 1031.34 1175.36 981.97 1497.06 2837.26 2479.07 1076.47 1820.65 1609.43 2051.37 2576.63 1672.36 2434.82 3356.19 3257.14 4866.14 4084.15 3371.14 3609.39 4716.26 3044.20 3551.07 0 20 19 23 44 28 37 29 36 25 18 30 53 23 14 33 34 32 35 30 27 25 30 28 44 18 20 17 23 28 37 34 32 29 34 26 57 31 37 54 53 65 64 73 66 72 55 56 120 M 88 Catering manager Honolulu 65+ 7364.31
In [307]:
sns.boxplot(x='age_group', y='Target', data=trans_df, palette='Set3')
plt.title('Dec 2022 Spending by Age Group')
plt.xticks(rotation=45)
plt.show()
In [308]:
sns.boxplot(x='gender', y='Target', data=trans_df, palette='Set3')
plt.title('Dec 2022 Spending by gender')
plt.xticks(rotation=45)
plt.show()
In [309]:
dec2022 = total_spent_per_month[total_spent_per_month['trans_month_year'] == '2022-12']
In [310]:
dec2022.head()
Out[310]:
acct_num trans_month_year category age gender job city state zip trans_count mean_amt max_amt min_amt total_amt
48 2348758451 2022-12 travel 42 M Surveyor, minerals Rochester NY 14621 285 47.048211 1562.15 1.10 13408.74
97 2468061102 2022-12 travel 60 F Nurse, adult Oceanside CA 92057 310 61.225613 1748.31 1.02 18979.94
146 3005591724 2022-12 travel 73 F Engineer, automotive Lancaster PA 17601 208 55.259808 384.94 1.05 11494.04
194 3418322859 2022-12 travel 17 F Operational investment banker Mountain View CA 94040 286 56.277762 1928.42 1.01 16095.44
242 4322238535 2022-12 travel 88 M Catering manager Honolulu HI 96816 120 61.369250 245.56 1.10 7364.31
In [311]:
sns.boxplot(x='category', y='total_amt', data=dec2022, palette='Set3')
plt.title('Dec 2022 Spending by Category')
plt.xticks(rotation=45)
plt.show()
In [312]:
dec_2022 = pred_trans_df[pred_trans_df['trans_month_year'] == '2022-12']
In [313]:
pred_trans_df.head()
Out[313]:
ssn cc_num first last gender street city state zip lat long city_pop job dob acct_num trans_num unix_time category amt is_fraud merchant merch_lat merch_long trans_date_time trans_month_year quarter first_trans_month_year age month year
0 115-04-4507 4218196001337 Jonathan Johnson M 863 Lawrence Valleys Ambler PA 19002 40.1809 -75.2156 32412 Accounting technician 1959-10-03 888022315787 91ab12e73ef38206e1121e9648d2408d 1558719550 gas_transport 69.12 0 Phillips Group 39.491416 -75.588522 2019-05-24 17:39:10 2019-05 2019Q2 2018-12-01 59 5 2019
1 115-04-4507 4218196001337 Jonathan Johnson M 863 Lawrence Valleys Ambler PA 19002 40.1809 -75.2156 32412 Accounting technician 1959-10-03 888022315787 071553d533a6822a4431c354c434ddcb 1569425519 grocery_pos 68.11 0 Tucker Ltd 40.890319 -75.573359 2019-09-25 15:31:59 2019-09 2019Q3 2018-12-01 59 9 2019
2 115-04-4507 4218196001337 Jonathan Johnson M 863 Lawrence Valleys Ambler PA 19002 40.1809 -75.2156 32412 Accounting technician 1959-10-03 888022315787 0cfad38ef15e4749eff68dc83f62c151 1577205601 misc_net 40.35 0 Dixon PLC 39.244958 -74.475327 2019-12-24 16:40:01 2019-12 2019Q4 2018-12-01 59 12 2019
3 115-04-4507 4218196001337 Jonathan Johnson M 863 Lawrence Valleys Ambler PA 19002 40.1809 -75.2156 32412 Accounting technician 1959-10-03 888022315787 5782693d7c70f062f258cb30bfa8900f 1571428238 grocery_pos 96.22 0 Lambert-Cooper 39.656925 -75.802342 2019-10-18 19:50:38 2019-10 2019Q4 2018-12-01 59 10 2019
4 115-04-4507 4218196001337 Jonathan Johnson M 863 Lawrence Valleys Ambler PA 19002 40.1809 -75.2156 32412 Accounting technician 1959-10-03 888022315787 35fd7db657d7e30dd608c37f7798186e 1549840400 gas_transport 71.89 0 Griffith LLC 40.313342 -74.220434 2019-02-10 23:13:20 2019-02 2019Q1 2018-12-01 59 2 2019
In [314]:
trans_df.shape
Out[314]:
(972, 104)

Correlation Plot¶

In [315]:
correlation = trans_df.corr()

plt.figure(figsize=(50, 50))
sns.heatmap(correlation, annot=True, cmap='YlGnBu')

plt.show()
/var/folders/lk/qbgh0syd1l7_7h6ghvq233nr0000gn/T/ipykernel_92512/3535905867.py:1: FutureWarning: The default value of numeric_only in DataFrame.corr is deprecated. In a future version, it will default to False. Select only valid columns or specify the value of numeric_only to silence this warning.
  correlation = trans_df.corr()
In [316]:
correlation
Out[316]:
total_amt', Period2018-12_M total_amt', Period2019-01_M total_amt', Period2019-02_M total_amt', Period2019-03_M total_amt', Period2019-04_M total_amt', Period2019-05_M total_amt', Period2019-06_M total_amt', Period2019-07_M total_amt', Period2019-08_M total_amt', Period2019-09_M total_amt', Period2019-10_M total_amt', Period2019-11_M total_amt', Period2019-12_M total_amt', Period2020-01_M total_amt', Period2020-02_M total_amt', Period2020-03_M total_amt', Period2020-04_M total_amt', Period2020-05_M total_amt', Period2020-06_M total_amt', Period2020-07_M total_amt', Period2020-08_M total_amt', Period2020-09_M total_amt', Period2020-10_M total_amt', Period2020-11_M total_amt', Period2020-12_M total_amt', Period2021-01_M total_amt', Period2021-02_M total_amt', Period2021-03_M total_amt', Period2021-04_M total_amt', Period2021-05_M total_amt', Period2021-06_M total_amt', Period2021-07_M total_amt', Period2021-08_M total_amt', Period2021-09_M total_amt', Period2021-10_M total_amt', Period2021-11_M total_amt', Period2021-12_M total_amt', Period2022-01_M total_amt', Period2022-02_M total_amt', Period2022-03_M total_amt', Period2022-04_M total_amt', Period2022-05_M total_amt', Period2022-06_M total_amt', Period2022-07_M total_amt', Period2022-08_M total_amt', Period2022-09_M total_amt', Period2022-10_M total_amt', Period2022-11_M trans_count', Period2018-12_M trans_count', Period2019-01_M trans_count', Period2019-02_M trans_count', Period2019-03_M trans_count', Period2019-04_M trans_count', Period2019-05_M trans_count', Period2019-06_M trans_count', Period2019-07_M trans_count', Period2019-08_M trans_count', Period2019-09_M trans_count', Period2019-10_M trans_count', Period2019-11_M trans_count', Period2019-12_M trans_count', Period2020-01_M trans_count', Period2020-02_M trans_count', Period2020-03_M trans_count', Period2020-04_M trans_count', Period2020-05_M trans_count', Period2020-06_M trans_count', Period2020-07_M trans_count', Period2020-08_M trans_count', Period2020-09_M trans_count', Period2020-10_M trans_count', Period2020-11_M trans_count', Period2020-12_M trans_count', Period2021-01_M trans_count', Period2021-02_M trans_count', Period2021-03_M trans_count', Period2021-04_M trans_count', Period2021-05_M trans_count', Period2021-06_M trans_count', Period2021-07_M trans_count', Period2021-08_M trans_count', Period2021-09_M trans_count', Period2021-10_M trans_count', Period2021-11_M trans_count', Period2021-12_M trans_count', Period2022-01_M trans_count', Period2022-02_M trans_count', Period2022-03_M trans_count', Period2022-04_M trans_count', Period2022-05_M trans_count', Period2022-06_M trans_count', Period2022-07_M trans_count', Period2022-08_M trans_count', Period2022-09_M trans_count', Period2022-10_M trans_count', Period2022-11_M trans_count', Period2022-12_M age Target
total_amt', Period2018-12_M 1.000000 0.532484 0.522456 0.511195 0.530995 0.517475 0.529913 0.530591 0.521404 0.539699 0.535104 0.520855 0.523571 0.515719 0.529609 0.513539 0.485701 0.446822 0.445354 0.501977 0.477245 0.486197 0.466752 0.458473 0.505537 0.475332 0.415254 0.482738 0.477579 0.460709 0.461116 0.478910 0.475206 0.483292 0.476297 0.446306 0.471941 0.425631 0.423659 0.413413 0.420878 0.433582 0.408367 0.437908 0.399066 0.419300 0.393458 0.405663 0.758131 0.452343 0.462406 0.458912 0.468279 0.454956 0.462828 0.464256 0.456326 0.470820 0.459396 0.458764 0.461434 0.470286 0.475649 0.457420 0.453956 0.449220 0.464791 0.470284 0.468283 0.471719 0.454737 0.459383 0.472859 0.462823 0.441932 0.469830 0.460774 0.456129 0.464774 0.463455 0.463294 0.458722 0.466350 0.446187 0.463210 0.391451 0.396722 0.382438 0.384457 0.380524 0.390909 0.401394 0.383796 0.388190 0.382005 0.384506 0.386503 -0.183845 0.434274
total_amt', Period2019-01_M 0.532484 1.000000 0.916214 0.937136 0.938201 0.937005 0.944725 0.945612 0.947833 0.937074 0.938838 0.933687 0.949845 0.927369 0.914979 0.935022 0.856693 0.842186 0.808883 0.849753 0.835867 0.824464 0.821580 0.800321 0.875595 0.800278 0.773230 0.837261 0.801300 0.808359 0.837400 0.847710 0.778866 0.797004 0.834971 0.789355 0.866196 0.738596 0.723252 0.710342 0.727539 0.730376 0.716978 0.706839 0.712005 0.723632 0.685961 0.683284 0.469361 0.867305 0.827236 0.834492 0.837639 0.832431 0.842734 0.842286 0.843911 0.836527 0.843813 0.847733 0.845453 0.835428 0.840678 0.838266 0.831728 0.832231 0.843920 0.840544 0.832978 0.843401 0.831950 0.838680 0.850894 0.826641 0.825560 0.838643 0.828041 0.834356 0.837536 0.833660 0.835547 0.837525 0.839388 0.830636 0.838665 0.711435 0.688998 0.678354 0.681420 0.693669 0.698531 0.689756 0.689761 0.684164 0.692214 0.681917 0.691093 -0.314821 0.756122
total_amt', Period2019-02_M 0.522456 0.916214 1.000000 0.928625 0.933851 0.932109 0.938462 0.935318 0.933530 0.935102 0.935894 0.921929 0.936889 0.929482 0.906812 0.919464 0.854616 0.836868 0.801239 0.845848 0.824963 0.813410 0.814041 0.794866 0.864707 0.790995 0.767196 0.836871 0.796464 0.800576 0.832140 0.839360 0.790242 0.791644 0.824704 0.786317 0.869827 0.720941 0.714472 0.697890 0.707402 0.716387 0.700699 0.696530 0.703610 0.710236 0.670724 0.671849 0.462963 0.821353 0.870161 0.830100 0.834151 0.828582 0.838449 0.836692 0.833723 0.835141 0.837570 0.837294 0.837470 0.841366 0.830363 0.831369 0.826092 0.831807 0.837182 0.835129 0.827187 0.833601 0.825370 0.834284 0.843543 0.821560 0.816118 0.829301 0.819529 0.829209 0.833878 0.829774 0.835839 0.831459 0.833806 0.827235 0.834301 0.692535 0.673323 0.661588 0.663667 0.677460 0.675784 0.674317 0.670360 0.668236 0.671368 0.666378 0.672320 -0.308327 0.736216
total_amt', Period2019-03_M 0.511195 0.937136 0.928625 1.000000 0.944329 0.946561 0.953934 0.955878 0.952494 0.944929 0.949748 0.942166 0.952763 0.929830 0.914640 0.936724 0.866597 0.849458 0.812309 0.860593 0.849638 0.835887 0.828208 0.827572 0.877416 0.800524 0.781997 0.842405 0.809728 0.821268 0.847173 0.841998 0.788268 0.807852 0.843001 0.787829 0.878504 0.744890 0.729361 0.718262 0.728907 0.732597 0.720922 0.711025 0.717676 0.725661 0.692241 0.684701 0.468197 0.835327 0.837044 0.870714 0.844074 0.842022 0.852381 0.850870 0.849431 0.848160 0.852769 0.854246 0.852151 0.848167 0.844247 0.847701 0.845664 0.844811 0.850482 0.848804 0.844382 0.853677 0.837464 0.849299 0.856097 0.829909 0.829020 0.845552 0.834987 0.847176 0.846073 0.839410 0.845161 0.849971 0.845954 0.834370 0.849719 0.712484 0.689472 0.680758 0.683252 0.694329 0.697214 0.688379 0.690341 0.682826 0.691742 0.687030 0.691127 -0.317154 0.754889
total_amt', Period2019-04_M 0.530995 0.938201 0.933851 0.944329 1.000000 0.951047 0.951196 0.955496 0.950751 0.947074 0.947989 0.943317 0.954231 0.935805 0.913551 0.941970 0.870734 0.851876 0.816102 0.860193 0.849607 0.833377 0.834691 0.821159 0.879438 0.801753 0.783943 0.841836 0.817693 0.820705 0.845892 0.850282 0.785117 0.794618 0.845026 0.782626 0.876798 0.754006 0.736801 0.723432 0.741517 0.742116 0.720047 0.721130 0.720828 0.729735 0.703767 0.687573 0.466034 0.835619 0.840877 0.843371 0.874956 0.845523 0.852257 0.850693 0.849340 0.846332 0.852477 0.855461 0.852981 0.850313 0.844559 0.850761 0.842492 0.843618 0.851377 0.849786 0.844117 0.852200 0.839617 0.848311 0.857726 0.836577 0.830156 0.841997 0.836232 0.846324 0.846305 0.841950 0.845408 0.845724 0.847967 0.835301 0.846067 0.720980 0.698612 0.691083 0.691615 0.703747 0.701992 0.698905 0.698208 0.690579 0.701867 0.694853 0.699185 -0.318468 0.760828
total_amt', Period2019-05_M 0.517475 0.937005 0.932109 0.946561 0.951047 1.000000 0.953899 0.953930 0.955459 0.946753 0.951378 0.944847 0.957814 0.939190 0.920633 0.939572 0.865240 0.845355 0.823189 0.857898 0.849753 0.841499 0.837901 0.821685 0.881403 0.800141 0.780155 0.840020 0.811467 0.820852 0.845813 0.855812 0.792227 0.808272 0.844530 0.797268 0.880724 0.747312 0.735611 0.716229 0.738930 0.736502 0.731702 0.715696 0.720976 0.726998 0.699281 0.687091 0.465663 0.841041 0.842734 0.847101 0.849373 0.873533 0.854790 0.853106 0.854924 0.850055 0.856715 0.857707 0.858691 0.853196 0.848352 0.849106 0.845087 0.846251 0.854238 0.853096 0.846604 0.858116 0.844983 0.852801 0.861938 0.835303 0.836149 0.846114 0.837999 0.852313 0.847810 0.846822 0.849642 0.849384 0.850287 0.842099 0.852541 0.720615 0.699520 0.687619 0.693059 0.703482 0.705345 0.700335 0.699899 0.690638 0.701191 0.692597 0.699547 -0.320345 0.762687
total_amt', Period2019-06_M 0.529913 0.944725 0.938462 0.953934 0.951196 0.953899 1.000000 0.962359 0.960953 0.956463 0.953610 0.951268 0.963040 0.942800 0.930568 0.941453 0.876298 0.853907 0.817987 0.863368 0.848563 0.836587 0.833242 0.824595 0.879509 0.804130 0.782277 0.846208 0.810119 0.823106 0.847522 0.850500 0.794651 0.814241 0.843420 0.791439 0.886241 0.749527 0.737941 0.718949 0.739820 0.737101 0.720356 0.714604 0.719936 0.725851 0.694761 0.691536 0.468306 0.840378 0.838535 0.846845 0.845405 0.844163 0.874446 0.852801 0.851066 0.848782 0.851791 0.857041 0.854831 0.847633 0.848464 0.846618 0.842380 0.845728 0.856488 0.851485 0.844402 0.851387 0.841978 0.850541 0.858753 0.833486 0.832801 0.846650 0.834347 0.845410 0.846228 0.842755 0.848757 0.846544 0.850755 0.837605 0.848754 0.714582 0.693278 0.683926 0.686240 0.697467 0.698274 0.694527 0.693276 0.684839 0.695884 0.688481 0.693433 -0.327645 0.758815
total_amt', Period2019-07_M 0.530591 0.945612 0.935318 0.955878 0.955496 0.953930 0.962359 1.000000 0.960179 0.953639 0.955936 0.951293 0.963679 0.939604 0.928270 0.948570 0.872090 0.856496 0.817224 0.862849 0.851660 0.843389 0.842050 0.825836 0.882305 0.805844 0.788315 0.846252 0.817480 0.818173 0.849906 0.858493 0.787473 0.814664 0.851908 0.796234 0.884223 0.752446 0.739745 0.725418 0.745081 0.744286 0.723619 0.726857 0.726148 0.730342 0.707365 0.697302 0.478752 0.841287 0.841762 0.848015 0.847243 0.845775 0.857282 0.876209 0.850850 0.850547 0.853757 0.859127 0.857003 0.851245 0.848025 0.852657 0.844680 0.847354 0.857067 0.853267 0.845227 0.856910 0.842750 0.854140 0.860508 0.833541 0.834782 0.846012 0.837847 0.850273 0.848157 0.844969 0.848376 0.849497 0.852688 0.838165 0.852593 0.718733 0.700187 0.690842 0.692114 0.703907 0.705526 0.702023 0.700026 0.693268 0.703741 0.692095 0.700103 -0.330162 0.767414
total_amt', Period2019-08_M 0.521404 0.947833 0.933530 0.952494 0.950751 0.955459 0.960953 0.960179 1.000000 0.954425 0.955215 0.949453 0.961861 0.943884 0.927275 0.943664 0.869102 0.850315 0.823379 0.862467 0.846218 0.836732 0.831373 0.821638 0.884477 0.808699 0.792435 0.846651 0.820024 0.819695 0.852928 0.863354 0.789950 0.810299 0.851879 0.793491 0.891064 0.748756 0.734571 0.710101 0.731262 0.734056 0.716172 0.715725 0.715935 0.725794 0.694738 0.690413 0.467535 0.843100 0.840316 0.848133 0.846596 0.845386 0.857034 0.853536 0.872691 0.852616 0.852823 0.858899 0.856632 0.854692 0.850126 0.850201 0.845209 0.845661 0.856239 0.853515 0.845346 0.856603 0.841874 0.855877 0.860508 0.834926 0.836035 0.847280 0.837990 0.849375 0.849608 0.845519 0.849981 0.851840 0.851657 0.838768 0.852516 0.715572 0.692499 0.680239 0.683681 0.696728 0.694803 0.692115 0.691264 0.685406 0.693420 0.685243 0.691406 -0.318723 0.757570
total_amt', Period2019-09_M 0.539699 0.937074 0.935102 0.944929 0.947074 0.946753 0.956463 0.953639 0.954425 1.000000 0.953389 0.943442 0.954467 0.932251 0.919621 0.935227 0.866395 0.844398 0.817040 0.857196 0.839314 0.841009 0.837542 0.819573 0.875705 0.809148 0.775294 0.840272 0.809211 0.813777 0.840542 0.849302 0.780298 0.799661 0.844426 0.787031 0.876235 0.748833 0.734449 0.715830 0.737366 0.736075 0.723171 0.719929 0.712678 0.726541 0.698218 0.684080 0.484111 0.834109 0.839286 0.843196 0.840499 0.841219 0.849991 0.847656 0.848204 0.872030 0.851707 0.850842 0.849773 0.845091 0.843241 0.842754 0.837955 0.840443 0.848818 0.848264 0.842051 0.853002 0.839597 0.845751 0.854474 0.833616 0.824318 0.840854 0.833288 0.842423 0.842850 0.841096 0.844746 0.842654 0.846919 0.833687 0.846134 0.717429 0.694365 0.682877 0.688189 0.696022 0.697622 0.696155 0.692092 0.685192 0.695329 0.687185 0.694720 -0.324954 0.761886
total_amt', Period2019-10_M 0.535104 0.938838 0.935894 0.949748 0.947989 0.951378 0.953610 0.955936 0.955215 0.953389 1.000000 0.943223 0.956162 0.936699 0.923176 0.939573 0.861699 0.844712 0.815067 0.862132 0.845172 0.839838 0.837094 0.818562 0.881798 0.804190 0.780733 0.841768 0.814384 0.809214 0.847662 0.848022 0.790080 0.805233 0.837020 0.788109 0.878820 0.746373 0.727265 0.713202 0.731437 0.733245 0.716529 0.714106 0.716521 0.719490 0.690752 0.684669 0.482206 0.839741 0.842873 0.847223 0.846124 0.846933 0.854744 0.852597 0.852617 0.852242 0.875611 0.856160 0.855043 0.854185 0.847986 0.848301 0.841783 0.844232 0.853699 0.850074 0.845952 0.857104 0.841540 0.851632 0.859154 0.833951 0.831182 0.846880 0.837649 0.845420 0.847209 0.843144 0.847269 0.845240 0.851644 0.835088 0.848271 0.715502 0.686305 0.681231 0.682674 0.695304 0.697443 0.692981 0.692508 0.681990 0.692692 0.686582 0.691326 -0.317660 0.754560
total_amt', Period2019-11_M 0.520855 0.933687 0.921929 0.942166 0.943317 0.944847 0.951268 0.951293 0.949453 0.943442 0.943223 1.000000 0.955128 0.931295 0.926052 0.936454 0.865690 0.845800 0.806814 0.852161 0.834909 0.838404 0.826590 0.811670 0.870316 0.787800 0.775049 0.836132 0.805849 0.810927 0.840910 0.844699 0.778445 0.799951 0.835893 0.783851 0.870674 0.751966 0.737661 0.714785 0.740507 0.739898 0.722863 0.722330 0.721674 0.722827 0.701690 0.689194 0.467490 0.822596 0.822838 0.830398 0.831155 0.828212 0.839350 0.838255 0.836124 0.834853 0.835944 0.867190 0.841385 0.833898 0.836569 0.833486 0.827109 0.829316 0.838042 0.837108 0.827290 0.843653 0.824964 0.835540 0.844236 0.819112 0.818102 0.831175 0.824937 0.832278 0.833911 0.829447 0.833690 0.833769 0.832424 0.823108 0.835341 0.709497 0.686723 0.678714 0.686633 0.692764 0.694308 0.691582 0.689957 0.679749 0.695174 0.684785 0.689249 -0.312195 0.754430
total_amt', Period2019-12_M 0.523571 0.949845 0.936889 0.952763 0.954231 0.957814 0.963040 0.963679 0.961861 0.954467 0.956162 0.955128 1.000000 0.950758 0.937170 0.946857 0.869203 0.860004 0.826300 0.866345 0.850771 0.842283 0.837709 0.829413 0.888700 0.803908 0.785557 0.850956 0.818786 0.822693 0.848562 0.857952 0.782097 0.808274 0.839210 0.799571 0.888657 0.761777 0.751094 0.728168 0.749447 0.753157 0.734642 0.732053 0.730632 0.740601 0.710046 0.704329 0.472337 0.838494 0.837416 0.840928 0.842828 0.840135 0.850908 0.849921 0.847840 0.844658 0.852335 0.854351 0.864487 0.849414 0.845170 0.846034 0.841669 0.841110 0.853370 0.850083 0.841418 0.852952 0.838939 0.850622 0.858225 0.830021 0.830879 0.843387 0.835539 0.845257 0.845331 0.840239 0.844223 0.845563 0.844685 0.835941 0.847102 0.723941 0.703548 0.692590 0.696505 0.708274 0.710196 0.703130 0.704423 0.695758 0.706967 0.699163 0.703618 -0.313784 0.771768
total_amt', Period2020-01_M 0.515719 0.927369 0.929482 0.929830 0.935805 0.939190 0.942800 0.939604 0.943884 0.932251 0.936699 0.931295 0.950758 1.000000 0.914126 0.929508 0.844622 0.838016 0.813749 0.843616 0.836454 0.828022 0.819903 0.810374 0.865941 0.796519 0.767838 0.837059 0.799749 0.798877 0.833320 0.837513 0.778458 0.805156 0.828551 0.774327 0.867870 0.739898 0.727243 0.707702 0.728251 0.733149 0.714429 0.707699 0.707351 0.724894 0.688702 0.687473 0.449970 0.813013 0.818792 0.813338 0.817815 0.813124 0.825166 0.820867 0.822599 0.819227 0.823674 0.824633 0.824816 0.856438 0.818418 0.819095 0.812018 0.815993 0.829243 0.820633 0.814795 0.827514 0.811494 0.822874 0.830157 0.809665 0.803133 0.820457 0.811332 0.815117 0.819760 0.815666 0.818845 0.821093 0.821475 0.808098 0.823219 0.700432 0.675948 0.664802 0.666009 0.681234 0.677933 0.677273 0.675784 0.668616 0.675463 0.667332 0.675469 -0.305790 0.752379
total_amt', Period2020-02_M 0.529609 0.914979 0.906812 0.914640 0.913551 0.920633 0.930568 0.928270 0.927275 0.919621 0.923176 0.926052 0.937170 0.914126 1.000000 0.916429 0.853743 0.839630 0.809091 0.847194 0.812164 0.820546 0.815475 0.790512 0.860183 0.784185 0.769357 0.830238 0.809469 0.793411 0.837313 0.836509 0.766017 0.786983 0.824944 0.790430 0.868080 0.737732 0.723896 0.703950 0.730249 0.728000 0.719419 0.716416 0.710895 0.717045 0.691949 0.680069 0.469047 0.805934 0.812655 0.808823 0.812738 0.811102 0.819685 0.819231 0.818147 0.818629 0.819330 0.826910 0.823534 0.821145 0.859851 0.816803 0.817136 0.817325 0.825514 0.823889 0.813063 0.823819 0.808763 0.819418 0.829180 0.808745 0.809038 0.820969 0.812729 0.821374 0.824557 0.816931 0.820168 0.821949 0.823526 0.812496 0.823028 0.699454 0.680356 0.667740 0.674279 0.681749 0.685305 0.678660 0.676305 0.671894 0.683872 0.670156 0.675368 -0.291454 0.748464
total_amt', Period2020-03_M 0.513539 0.935022 0.919464 0.936724 0.941970 0.939572 0.941453 0.948570 0.943664 0.935227 0.939573 0.936454 0.946857 0.929508 0.916429 1.000000 0.859649 0.851601 0.809412 0.845353 0.836106 0.832415 0.816521 0.805156 0.877783 0.802319 0.788457 0.843919 0.810127 0.827840 0.842586 0.846562 0.788282 0.801736 0.836418 0.785224 0.869897 0.752503 0.733344 0.718754 0.731418 0.738159 0.715960 0.714486 0.723762 0.734158 0.702185 0.693752 0.464206 0.829202 0.823799 0.825855 0.831412 0.826711 0.834914 0.836654 0.834789 0.831284 0.834757 0.839552 0.835840 0.832837 0.833949 0.859817 0.829212 0.829420 0.840067 0.832099 0.827833 0.839180 0.823068 0.831793 0.846524 0.822310 0.818005 0.831303 0.822874 0.835737 0.834130 0.826848 0.833697 0.831932 0.831107 0.824353 0.836989 0.709346 0.685964 0.677049 0.678187 0.691315 0.688463 0.686814 0.686971 0.679282 0.691428 0.682229 0.687648 -0.316831 0.762595
total_amt', Period2020-04_M 0.485701 0.856693 0.854616 0.866597 0.870734 0.865240 0.876298 0.872090 0.869102 0.866395 0.861699 0.865690 0.869203 0.844622 0.853743 0.859649 1.000000 0.840989 0.801433 0.847051 0.834523 0.828472 0.822152 0.811324 0.866366 0.798959 0.773993 0.838165 0.807955 0.812166 0.835395 0.838280 0.766574 0.782726 0.832518 0.773285 0.859573 0.674635 0.644672 0.631092 0.657191 0.669268 0.672347 0.657171 0.667004 0.674628 0.646981 0.627969 0.485464 0.850917 0.849549 0.852736 0.860531 0.851263 0.862802 0.860017 0.858073 0.855195 0.855024 0.864256 0.855792 0.854218 0.858991 0.855590 0.888219 0.856457 0.863245 0.861838 0.855086 0.860032 0.853030 0.859128 0.866113 0.850791 0.854158 0.861158 0.853805 0.858827 0.855394 0.858764 0.861111 0.859388 0.861739 0.853786 0.862990 0.697603 0.669698 0.657860 0.666771 0.675192 0.673857 0.670431 0.666265 0.660779 0.674390 0.661446 0.671346 -0.248190 0.697025
total_amt', Period2020-05_M 0.446822 0.842186 0.836868 0.849458 0.851876 0.845355 0.853907 0.856496 0.850315 0.844398 0.844712 0.845800 0.860004 0.838016 0.839630 0.851601 0.840989 1.000000 0.806343 0.847871 0.837474 0.813473 0.816868 0.813686 0.861713 0.804348 0.781152 0.826486 0.818018 0.810488 0.834491 0.832271 0.775290 0.795886 0.824940 0.776832 0.859753 0.657608 0.632602 0.603902 0.630553 0.644195 0.652644 0.645686 0.659361 0.657596 0.637444 0.613387 0.479081 0.864858 0.864501 0.867290 0.869178 0.863429 0.874092 0.876971 0.869166 0.866643 0.865995 0.873185 0.877885 0.869979 0.873607 0.873574 0.878455 0.893951 0.877946 0.871269 0.867331 0.866935 0.865063 0.874658 0.877283 0.868204 0.858249 0.863834 0.870206 0.869265 0.869753 0.872193 0.881120 0.875106 0.872996 0.864882 0.879445 0.699806 0.673265 0.662994 0.664106 0.678629 0.679130 0.669480 0.670035 0.673062 0.675682 0.666054 0.674330 -0.280079 0.709183
total_amt', Period2020-06_M 0.445354 0.808883 0.801239 0.812309 0.816102 0.823189 0.817987 0.817224 0.823379 0.817040 0.815067 0.806814 0.826300 0.813749 0.809091 0.809412 0.801433 0.806343 1.000000 0.824093 0.790245 0.784600 0.793989 0.776071 0.844297 0.776442 0.747498 0.795729 0.778834 0.770877 0.796804 0.797548 0.749587 0.745775 0.794994 0.744705 0.829917 0.623147 0.606901 0.585160 0.608398 0.621053 0.645636 0.629752 0.619612 0.638981 0.614928 0.605353 0.472909 0.831444 0.823604 0.827087 0.832778 0.835224 0.834572 0.832209 0.835806 0.834563 0.836785 0.836454 0.837656 0.844153 0.831936 0.835972 0.835339 0.832715 0.866434 0.842639 0.837935 0.832636 0.831023 0.836407 0.843585 0.833885 0.825870 0.836477 0.831869 0.835499 0.835679 0.829580 0.836382 0.834678 0.845524 0.832502 0.841700 0.679121 0.654875 0.643607 0.647125 0.660406 0.663682 0.656154 0.650962 0.647449 0.656761 0.647089 0.655558 -0.253134 0.672751
total_amt', Period2020-07_M 0.501977 0.849753 0.845848 0.860593 0.860193 0.857898 0.863368 0.862849 0.862467 0.857196 0.862132 0.852161 0.866345 0.843616 0.847194 0.845353 0.847051 0.847871 0.824093 1.000000 0.848807 0.823132 0.835570 0.819269 0.864159 0.792638 0.793667 0.836056 0.825752 0.801458 0.843469 0.835033 0.781385 0.783119 0.827048 0.784321 0.856908 0.659585 0.637151 0.614829 0.635527 0.645198 0.662911 0.666026 0.652286 0.662503 0.640796 0.640435 0.510456 0.862340 0.862528 0.872208 0.872781 0.860464 0.875285 0.867922 0.873284 0.870173 0.872540 0.875436 0.870269 0.870584 0.871446 0.867747 0.872674 0.870352 0.878817 0.901283 0.867992 0.872829 0.864773 0.871239 0.876992 0.863269 0.860332 0.872952 0.866749 0.866498 0.871740 0.865096 0.874206 0.870715 0.874834 0.866862 0.877744 0.703439 0.671278 0.666287 0.663314 0.678809 0.675417 0.671938 0.668603 0.667312 0.674469 0.667621 0.673420 -0.250594 0.701580
total_amt', Period2020-08_M 0.477245 0.835867 0.824963 0.849638 0.849607 0.849753 0.848563 0.851660 0.846218 0.839314 0.845172 0.834909 0.850771 0.836454 0.812164 0.836106 0.834523 0.837474 0.790245 0.848807 1.000000 0.806364 0.820203 0.816261 0.852551 0.799862 0.773943 0.822494 0.823116 0.794878 0.838124 0.832145 0.776504 0.774239 0.820864 0.762720 0.856917 0.658404 0.628213 0.614694 0.630875 0.652513 0.656614 0.652862 0.657128 0.659793 0.632172 0.621899 0.501250 0.867221 0.862084 0.872678 0.871187 0.869819 0.874352 0.878364 0.874050 0.868886 0.876637 0.877776 0.877918 0.876997 0.865883 0.875042 0.871847 0.872896 0.878615 0.877998 0.900458 0.879377 0.867329 0.877149 0.880929 0.870218 0.867196 0.877207 0.870659 0.871851 0.870259 0.876273 0.879403 0.870947 0.878606 0.868488 0.881406 0.710189 0.680931 0.677098 0.674186 0.686733 0.684657 0.680064 0.678642 0.679077 0.684896 0.673862 0.681997 -0.262369 0.694828
total_amt', Period2020-09_M 0.486197 0.824464 0.813410 0.835887 0.833377 0.841499 0.836587 0.843389 0.836732 0.841009 0.839838 0.838404 0.842283 0.828022 0.820546 0.832415 0.828472 0.813473 0.784600 0.823132 0.806364 1.000000 0.802783 0.783222 0.847025 0.780185 0.749586 0.822932 0.789009 0.778451 0.818017 0.824686 0.768973 0.778712 0.806085 0.740755 0.838687 0.671684 0.648304 0.621175 0.657036 0.662651 0.686581 0.654940 0.666492 0.664407 0.635497 0.635092 0.490791 0.834040 0.832155 0.843911 0.842371 0.845875 0.850917 0.850918 0.846457 0.848898 0.846841 0.850451 0.847964 0.851757 0.846598 0.847050 0.849668 0.842475 0.854636 0.851212 0.845916 0.882716 0.841136 0.843593 0.857063 0.836720 0.830926 0.854235 0.844610 0.843989 0.849042 0.853595 0.853383 0.848028 0.845503 0.835618 0.856426 0.704659 0.675514 0.669384 0.675292 0.682876 0.683377 0.678496 0.678743 0.674979 0.683370 0.672602 0.677314 -0.263945 0.693890
total_amt', Period2020-10_M 0.466752 0.821580 0.814041 0.828208 0.834691 0.837901 0.833242 0.842050 0.831373 0.837542 0.837094 0.826590 0.837709 0.819903 0.815475 0.816521 0.822152 0.816868 0.793989 0.835570 0.820203 0.802783 1.000000 0.793756 0.851424 0.773435 0.770448 0.817261 0.815349 0.780432 0.818086 0.815409 0.756906 0.780611 0.823262 0.768379 0.840438 0.640238 0.635065 0.598678 0.616873 0.624113 0.643810 0.642746 0.630037 0.639901 0.625242 0.609468 0.485507 0.847965 0.845926 0.852320 0.853066 0.856400 0.857115 0.862387 0.857710 0.863365 0.860262 0.865578 0.859502 0.861607 0.856971 0.853675 0.858267 0.856794 0.862192 0.864724 0.856583 0.866177 0.891589 0.856815 0.863899 0.847512 0.853544 0.856093 0.860606 0.855702 0.863612 0.859499 0.853258 0.859555 0.859741 0.856345 0.863875 0.685613 0.663714 0.650814 0.653688 0.657781 0.662728 0.657600 0.659911 0.654772 0.662381 0.653144 0.658319 -0.236607 0.674775
total_amt', Period2020-11_M 0.458473 0.800321 0.794866 0.827572 0.821159 0.821685 0.824595 0.825836 0.821638 0.819573 0.818562 0.811670 0.829413 0.810374 0.790512 0.805156 0.811324 0.813686 0.776071 0.819269 0.816261 0.783222 0.793756 1.000000 0.826117 0.765628 0.751847 0.804346 0.795276 0.776687 0.815931 0.816836 0.746658 0.743482 0.800522 0.730007 0.837022 0.643298 0.617644 0.596428 0.619163 0.645709 0.653662 0.637574 0.640429 0.644054 0.643198 0.607116 0.482933 0.827033 0.823170 0.839549 0.838775 0.836375 0.845160 0.843069 0.835894 0.843875 0.838072 0.842200 0.841384 0.839261 0.832610 0.840949 0.834632 0.838791 0.849870 0.843759 0.839161 0.849412 0.830690 0.876984 0.844049 0.834519 0.824016 0.845602 0.836483 0.838458 0.847664 0.839343 0.842186 0.836770 0.846757 0.832861 0.848910 0.693071 0.664956 0.659089 0.657854 0.674329 0.669810 0.666289 0.663248 0.656360 0.667404 0.652119 0.668906 -0.257556 0.689751
total_amt', Period2020-12_M 0.505537 0.875595 0.864707 0.877416 0.879438 0.881403 0.879509 0.882305 0.884477 0.875705 0.881798 0.870316 0.888700 0.865941 0.860183 0.877783 0.866366 0.861713 0.844297 0.864159 0.852551 0.847025 0.851424 0.826117 1.000000 0.815409 0.803312 0.857624 0.839505 0.823415 0.865175 0.863391 0.790196 0.806318 0.836247 0.799932 0.883340 0.670343 0.645004 0.618703 0.645347 0.655855 0.682345 0.666876 0.672354 0.660114 0.648611 0.640331 0.513899 0.885136 0.884840 0.884587 0.890986 0.887294 0.891971 0.892698 0.892099 0.889714 0.896240 0.894014 0.896682 0.891403 0.883090 0.893379 0.893720 0.886409 0.891179 0.893655 0.890150 0.896164 0.891328 0.893927 0.920369 0.879853 0.882983 0.895722 0.888803 0.890501 0.895285 0.891745 0.890474 0.891990 0.893257 0.885006 0.896249 0.712751 0.681482 0.673947 0.679059 0.688090 0.688169 0.681234 0.680155 0.674944 0.684281 0.676305 0.682646 -0.266159 0.706695
total_amt', Period2021-01_M 0.475332 0.800278 0.790995 0.800524 0.801753 0.800141 0.804130 0.805844 0.808699 0.809148 0.804190 0.787800 0.803908 0.796519 0.784185 0.802319 0.798959 0.804348 0.776442 0.792638 0.799862 0.780185 0.773435 0.765628 0.815409 1.000000 0.753068 0.795844 0.789011 0.761779 0.814013 0.796879 0.760450 0.754640 0.792968 0.759878 0.828306 0.653574 0.620659 0.593945 0.616883 0.635610 0.649472 0.657766 0.655386 0.672920 0.638459 0.621594 0.498540 0.826927 0.822855 0.828691 0.828320 0.828446 0.831176 0.833155 0.831994 0.836803 0.831566 0.826846 0.836362 0.833559 0.828894 0.836179 0.832967 0.834465 0.842782 0.835311 0.840271 0.837671 0.827259 0.834488 0.840067 0.875907 0.834174 0.835654 0.838788 0.836818 0.843795 0.838308 0.838745 0.842528 0.835343 0.833453 0.838157 0.698680 0.675285 0.666183 0.662207 0.673722 0.676060 0.678518 0.672926 0.670954 0.678259 0.668173 0.673478 -0.287751 0.688967
total_amt', Period2021-02_M 0.415254 0.773230 0.767196 0.781997 0.783943 0.780155 0.782277 0.788315 0.792435 0.775294 0.780733 0.775049 0.785557 0.767838 0.769357 0.788457 0.773993 0.781152 0.747498 0.793667 0.773943 0.749586 0.770448 0.751847 0.803312 0.753068 1.000000 0.793621 0.766442 0.746870 0.794749 0.776078 0.743634 0.722422 0.770963 0.718047 0.815118 0.606273 0.574141 0.561752 0.581066 0.599953 0.602978 0.605969 0.619095 0.608305 0.602918 0.587923 0.453495 0.810037 0.811205 0.816737 0.817614 0.809714 0.819773 0.819775 0.827147 0.818336 0.814980 0.818779 0.819606 0.817766 0.811999 0.824670 0.817253 0.813023 0.820866 0.824153 0.814214 0.820814 0.811606 0.815329 0.823971 0.810237 0.857156 0.826597 0.822809 0.816305 0.825359 0.817188 0.820443 0.825624 0.822236 0.814392 0.826159 0.657298 0.624002 0.622298 0.620498 0.631581 0.627268 0.626695 0.625790 0.625327 0.630377 0.617407 0.621871 -0.256310 0.638166
total_amt', Period2021-03_M 0.482738 0.837261 0.836871 0.842405 0.841836 0.840020 0.846208 0.846252 0.846651 0.840272 0.841768 0.836132 0.850956 0.837059 0.830238 0.843919 0.838165 0.826486 0.795729 0.836056 0.822494 0.822932 0.817261 0.804346 0.857624 0.795844 0.793621 1.000000 0.816650 0.802094 0.829238 0.833204 0.771792 0.775037 0.830704 0.770031 0.851767 0.660021 0.635524 0.627351 0.643777 0.657956 0.677890 0.663152 0.666305 0.667574 0.653069 0.645942 0.497957 0.848762 0.853107 0.853374 0.854308 0.848513 0.860435 0.858640 0.857949 0.854747 0.852611 0.858948 0.859462 0.861288 0.855663 0.858192 0.860176 0.854299 0.862440 0.858998 0.854820 0.864524 0.848891 0.855297 0.862997 0.853340 0.850078 0.887800 0.854594 0.858874 0.862108 0.857279 0.857385 0.855611 0.861903 0.854579 0.863449 0.706977 0.677974 0.674358 0.674833 0.686381 0.686826 0.682636 0.677295 0.675826 0.680370 0.673343 0.682048 -0.264663 0.711260
total_amt', Period2021-04_M 0.477579 0.801300 0.796464 0.809728 0.817693 0.811467 0.810119 0.817480 0.820024 0.809211 0.814384 0.805849 0.818786 0.799749 0.809469 0.810127 0.807955 0.818018 0.778834 0.825752 0.823116 0.789009 0.815349 0.795276 0.839505 0.789011 0.766442 0.816650 1.000000 0.784325 0.821260 0.808775 0.762056 0.761180 0.810708 0.759477 0.839062 0.627065 0.587097 0.566314 0.583734 0.612427 0.617552 0.614708 0.634151 0.627048 0.596953 0.606619 0.492438 0.833457 0.832401 0.838989 0.844775 0.840916 0.843816 0.844805 0.845663 0.845130 0.845672 0.851205 0.845918 0.844403 0.845624 0.846180 0.846650 0.850693 0.851523 0.846847 0.849836 0.848851 0.841230 0.847128 0.853788 0.841850 0.841145 0.844547 0.878431 0.846598 0.852480 0.844497 0.850550 0.849588 0.852824 0.843407 0.852731 0.675977 0.637803 0.636735 0.632547 0.649073 0.643742 0.639502 0.646275 0.634655 0.644966 0.633539 0.643505 -0.265520 0.666261
total_amt', Period2021-05_M 0.460709 0.808359 0.800576 0.821268 0.820705 0.820852 0.823106 0.818173 0.819695 0.813777 0.809214 0.810927 0.822693 0.798877 0.793411 0.827840 0.812166 0.810488 0.770877 0.801458 0.794878 0.778451 0.780432 0.776687 0.823415 0.761779 0.746870 0.802094 0.784325 1.000000 0.810763 0.800745 0.740192 0.753670 0.786160 0.737556 0.832624 0.645610 0.608060 0.606820 0.624844 0.632315 0.652936 0.644667 0.659245 0.637699 0.635143 0.609416 0.477867 0.821174 0.820285 0.834354 0.829801 0.825002 0.835081 0.831893 0.834351 0.830942 0.823723 0.832225 0.835571 0.829002 0.824469 0.834639 0.839882 0.835133 0.841168 0.833140 0.829678 0.826085 0.828289 0.835415 0.838782 0.825547 0.829252 0.833328 0.825350 0.864922 0.835499 0.830185 0.841012 0.829128 0.835904 0.827985 0.840924 0.682073 0.651841 0.645395 0.651801 0.658279 0.660551 0.656361 0.653789 0.646701 0.662196 0.651326 0.658233 -0.281386 0.678258
total_amt', Period2021-06_M 0.461116 0.837400 0.832140 0.847173 0.845892 0.845813 0.847522 0.849906 0.852928 0.840542 0.847662 0.840910 0.848562 0.833320 0.837313 0.842586 0.835395 0.834491 0.796804 0.843469 0.838124 0.818017 0.818086 0.815931 0.865175 0.814013 0.794749 0.829238 0.821260 0.810763 1.000000 0.840568 0.781678 0.802595 0.830789 0.774766 0.867800 0.675497 0.632633 0.613937 0.639062 0.651142 0.666822 0.670441 0.682974 0.668824 0.651563 0.629076 0.492030 0.867395 0.872866 0.875990 0.879482 0.873069 0.880029 0.879515 0.882775 0.875472 0.877032 0.882089 0.883489 0.885047 0.878974 0.886203 0.884154 0.885528 0.889857 0.883911 0.882278 0.885690 0.878975 0.883382 0.890442 0.879342 0.879574 0.887600 0.882491 0.884063 0.912472 0.884522 0.886666 0.887948 0.887688 0.879331 0.891030 0.727882 0.692078 0.691239 0.695940 0.699393 0.699280 0.693067 0.699583 0.687320 0.705856 0.692138 0.699529 -0.316324 0.709477
total_amt', Period2021-07_M 0.478910 0.847710 0.839360 0.841998 0.850282 0.855812 0.850500 0.858493 0.863354 0.849302 0.848022 0.844699 0.857952 0.837513 0.836509 0.846562 0.838280 0.832271 0.797548 0.835033 0.832145 0.824686 0.815409 0.816836 0.863391 0.796879 0.776078 0.833204 0.808775 0.800745 0.840568 1.000000 0.772714 0.801728 0.845850 0.774571 0.861364 0.675025 0.647721 0.630640 0.650572 0.666436 0.675242 0.674190 0.681045 0.680159 0.641775 0.643938 0.496206 0.858029 0.857599 0.856516 0.858813 0.859492 0.866968 0.867560 0.868081 0.864561 0.862307 0.868110 0.870231 0.865490 0.860657 0.864568 0.864139 0.868226 0.869002 0.870599 0.865589 0.873158 0.867138 0.869573 0.877406 0.860408 0.857013 0.869511 0.867627 0.868917 0.872879 0.889547 0.867957 0.871057 0.873343 0.858957 0.876728 0.715777 0.687885 0.682778 0.684271 0.695373 0.691893 0.689911 0.690176 0.688407 0.688227 0.680297 0.691764 -0.302711 0.722674
total_amt', Period2021-08_M 0.475206 0.778866 0.790242 0.788268 0.785117 0.792227 0.794651 0.787473 0.789950 0.780298 0.790080 0.778445 0.782097 0.778458 0.766017 0.788282 0.766574 0.775290 0.749587 0.781385 0.776504 0.768973 0.756906 0.746658 0.790196 0.760450 0.743634 0.771792 0.762056 0.740192 0.781678 0.772714 1.000000 0.741208 0.773351 0.715888 0.812595 0.629066 0.594217 0.575848 0.594476 0.599730 0.623365 0.616294 0.632242 0.638999 0.595940 0.598759 0.484118 0.803068 0.815242 0.814943 0.813133 0.812509 0.818008 0.814901 0.815709 0.813548 0.811621 0.808976 0.813579 0.827901 0.809988 0.813509 0.808143 0.810175 0.821607 0.817686 0.815161 0.819689 0.808631 0.812972 0.821615 0.812190 0.802824 0.817210 0.820722 0.815768 0.815288 0.814668 0.839336 0.815995 0.814881 0.809838 0.822088 0.673349 0.648396 0.641121 0.638037 0.646214 0.649635 0.644750 0.644037 0.641402 0.648356 0.633065 0.640839 -0.287198 0.651812
total_amt', Period2021-09_M 0.483292 0.797004 0.791644 0.807852 0.794618 0.808272 0.814241 0.814664 0.810299 0.799661 0.805233 0.799951 0.808274 0.805156 0.786983 0.801736 0.782726 0.795886 0.745775 0.783119 0.774239 0.778712 0.780611 0.743482 0.806318 0.754640 0.722422 0.775037 0.761180 0.753670 0.802595 0.801728 0.741208 1.000000 0.765884 0.738678 0.805963 0.643848 0.608903 0.592061 0.620547 0.613804 0.633848 0.623896 0.640626 0.640622 0.614235 0.599072 0.501903 0.796010 0.793907 0.807301 0.795793 0.800381 0.812641 0.807560 0.803491 0.802012 0.800364 0.813086 0.808460 0.809180 0.801170 0.806146 0.801620 0.812169 0.810846 0.805930 0.800896 0.809093 0.812074 0.804639 0.809342 0.800130 0.798934 0.808183 0.800815 0.803747 0.811085 0.809659 0.813269 0.835917 0.809729 0.803070 0.812219 0.671201 0.632295 0.632333 0.631868 0.635526 0.638216 0.637049 0.637886 0.628512 0.639768 0.632765 0.635479 -0.275483 0.662267
total_amt', Period2021-10_M 0.476297 0.834971 0.824704 0.843001 0.845026 0.844530 0.843420 0.851908 0.851879 0.844426 0.837020 0.835893 0.839210 0.828551 0.824944 0.836418 0.832518 0.824940 0.794994 0.827048 0.820864 0.806085 0.823262 0.800522 0.836247 0.792968 0.770963 0.830704 0.810708 0.786160 0.830789 0.845850 0.773351 0.765884 1.000000 0.759562 0.844598 0.659359 0.649688 0.614980 0.647098 0.650972 0.659179 0.656405 0.665522 0.682014 0.642610 0.625860 0.483651 0.839599 0.844295 0.848654 0.848252 0.849656 0.845425 0.852940 0.850754 0.851162 0.844691 0.849273 0.852317 0.852969 0.843693 0.847782 0.853793 0.854646 0.858662 0.855844 0.850594 0.854147 0.851381 0.856394 0.860173 0.851020 0.839201 0.854586 0.851739 0.853495 0.856025 0.853465 0.851245 0.848588 0.880462 0.849140 0.858108 0.703209 0.674024 0.660750 0.668346 0.677952 0.677476 0.672796 0.670578 0.672335 0.675826 0.664605 0.675287 -0.292653 0.709599
total_amt', Period2021-11_M 0.446306 0.789355 0.786317 0.787829 0.782626 0.797268 0.791439 0.796234 0.793491 0.787031 0.788109 0.783851 0.799571 0.774327 0.790430 0.785224 0.773285 0.776832 0.744705 0.784321 0.762720 0.740755 0.768379 0.730007 0.799932 0.759878 0.718047 0.770031 0.759477 0.737556 0.774766 0.774571 0.715888 0.738678 0.759562 1.000000 0.806195 0.628101 0.606975 0.590814 0.606378 0.623140 0.632374 0.634487 0.639084 0.641317 0.611075 0.609270 0.462200 0.797547 0.802583 0.797377 0.796717 0.798538 0.801744 0.803974 0.800960 0.802815 0.805998 0.804020 0.806325 0.797452 0.803746 0.803126 0.806219 0.802144 0.806326 0.803352 0.797764 0.800672 0.801389 0.800262 0.809490 0.801373 0.800263 0.804833 0.803776 0.806309 0.800385 0.805457 0.808275 0.809525 0.803156 0.830238 0.812390 0.670213 0.642693 0.634674 0.636572 0.648841 0.648653 0.641581 0.645226 0.646185 0.645588 0.640701 0.644104 -0.277398 0.666978
total_amt', Period2021-12_M 0.471941 0.866196 0.869827 0.878504 0.876798 0.880724 0.886241 0.884223 0.891064 0.876235 0.878820 0.870674 0.888657 0.867870 0.868080 0.869897 0.859573 0.859753 0.829917 0.856908 0.856917 0.838687 0.840438 0.837022 0.883340 0.828306 0.815118 0.851767 0.839062 0.832624 0.867800 0.861364 0.812595 0.805963 0.844598 0.806195 1.000000 0.695607 0.656057 0.634426 0.661567 0.682089 0.690018 0.693509 0.696125 0.697670 0.671312 0.658042 0.498203 0.882349 0.887027 0.891025 0.890029 0.890360 0.897324 0.894151 0.898391 0.891175 0.895028 0.895872 0.897528 0.892282 0.891653 0.893022 0.887143 0.891476 0.901088 0.894689 0.895363 0.896496 0.888408 0.896863 0.902532 0.889742 0.886847 0.895471 0.892859 0.893501 0.900457 0.893634 0.898360 0.895177 0.896317 0.889986 0.915550 0.734516 0.700906 0.695031 0.697487 0.706582 0.707091 0.705079 0.701228 0.699030 0.706657 0.695593 0.701500 -0.318287 0.731005
total_amt', Period2022-01_M 0.425631 0.738596 0.720941 0.744890 0.754006 0.747312 0.749527 0.752446 0.748756 0.748833 0.746373 0.751966 0.761777 0.739898 0.737732 0.752503 0.674635 0.657608 0.623147 0.659585 0.658404 0.671684 0.640238 0.643298 0.670343 0.653574 0.606273 0.660021 0.627065 0.645610 0.675497 0.675025 0.629066 0.643848 0.659359 0.628101 0.695607 1.000000 0.883676 0.881338 0.896079 0.894339 0.855823 0.857600 0.858190 0.865026 0.851335 0.823180 0.368353 0.639137 0.632641 0.647443 0.648542 0.644262 0.649411 0.652859 0.647894 0.648796 0.649143 0.656003 0.654720 0.653893 0.654266 0.647110 0.643662 0.647402 0.657238 0.650441 0.655151 0.661767 0.636302 0.654463 0.657859 0.654801 0.633372 0.652921 0.643644 0.653296 0.659509 0.654579 0.651456 0.658455 0.649211 0.644061 0.650143 0.870999 0.841274 0.847472 0.851319 0.851104 0.851914 0.850658 0.849944 0.845167 0.854856 0.849208 0.850431 -0.414632 0.881193
total_amt', Period2022-02_M 0.423659 0.723252 0.714472 0.729361 0.736801 0.735611 0.737941 0.739745 0.734571 0.734449 0.727265 0.737661 0.751094 0.727243 0.723896 0.733344 0.644672 0.632602 0.606901 0.637151 0.628213 0.648304 0.635065 0.617644 0.645004 0.620659 0.574141 0.635524 0.587097 0.608060 0.632633 0.647721 0.594217 0.608903 0.649688 0.606975 0.656057 0.883676 1.000000 0.900810 0.904186 0.898019 0.858006 0.859814 0.856983 0.879038 0.845883 0.824174 0.350528 0.594229 0.600171 0.604270 0.610641 0.607653 0.608878 0.612684 0.604810 0.607961 0.609502 0.618120 0.612870 0.613016 0.612110 0.605830 0.598896 0.607639 0.614022 0.614259 0.612405 0.623802 0.594838 0.614687 0.617508 0.606810 0.590488 0.606438 0.596235 0.614028 0.614823 0.611080 0.606073 0.617068 0.610966 0.611118 0.606469 0.825256 0.871897 0.842511 0.840038 0.839816 0.846499 0.843304 0.843900 0.843595 0.844115 0.845864 0.845610 -0.410968 0.882408
total_amt', Period2022-03_M 0.413413 0.710342 0.697890 0.718262 0.723432 0.716229 0.718949 0.725418 0.710101 0.715830 0.713202 0.714785 0.728168 0.707702 0.703950 0.718754 0.631092 0.603902 0.585160 0.614829 0.614694 0.621175 0.598678 0.596428 0.618703 0.593945 0.561752 0.627351 0.566314 0.606820 0.613937 0.630640 0.575848 0.592061 0.614980 0.590814 0.634426 0.881338 0.900810 1.000000 0.903376 0.895180 0.847539 0.848856 0.852346 0.858093 0.839957 0.826637 0.337983 0.578121 0.581321 0.585532 0.586439 0.586210 0.589333 0.595487 0.580015 0.585248 0.590743 0.591763 0.591939 0.595720 0.591221 0.587687 0.582942 0.586049 0.594395 0.588250 0.593187 0.597142 0.574834 0.591694 0.595496 0.587042 0.568729 0.590446 0.573805 0.598556 0.593725 0.589418 0.588579 0.599657 0.591634 0.586675 0.585431 0.813924 0.831407 0.850891 0.829609 0.833267 0.835346 0.833152 0.833228 0.828886 0.832791 0.833495 0.836847 -0.401775 0.875140
total_amt', Period2022-04_M 0.420878 0.727539 0.707402 0.728907 0.741517 0.738930 0.739820 0.745081 0.731262 0.737366 0.731437 0.740507 0.749447 0.728251 0.730249 0.731418 0.657191 0.630553 0.608398 0.635527 0.630875 0.657036 0.616873 0.619163 0.645347 0.616883 0.581066 0.643777 0.583734 0.624844 0.639062 0.650572 0.594476 0.620547 0.647098 0.606378 0.661567 0.896079 0.904186 0.903376 1.000000 0.906535 0.862011 0.869383 0.858111 0.868710 0.848555 0.818492 0.348927 0.602247 0.601175 0.609924 0.616415 0.614099 0.616432 0.624001 0.611890 0.610647 0.615798 0.623392 0.621112 0.627100 0.620218 0.611226 0.611938 0.615337 0.622076 0.617659 0.617275 0.628607 0.602951 0.625915 0.625993 0.617008 0.598934 0.620035 0.601691 0.623964 0.622822 0.614901 0.613598 0.625401 0.621812 0.614304 0.612539 0.836083 0.841552 0.846604 0.869563 0.850124 0.852970 0.851385 0.849214 0.845094 0.852597 0.847493 0.848946 -0.413467 0.884264
total_amt', Period2022-05_M 0.433582 0.730376 0.716387 0.732597 0.742116 0.736502 0.737101 0.744286 0.734056 0.736075 0.733245 0.739898 0.753157 0.733149 0.728000 0.738159 0.669268 0.644195 0.621053 0.645198 0.652513 0.662651 0.624113 0.645709 0.655855 0.635610 0.599953 0.657956 0.612427 0.632315 0.651142 0.666436 0.599730 0.613804 0.650972 0.623140 0.682089 0.894339 0.898019 0.895180 0.906535 1.000000 0.869486 0.873433 0.879291 0.875752 0.854501 0.831472 0.381591 0.631492 0.628957 0.632642 0.635399 0.633001 0.635399 0.644979 0.632256 0.634265 0.637817 0.643721 0.642331 0.645360 0.638852 0.638075 0.633727 0.635794 0.646584 0.636987 0.642210 0.654656 0.620289 0.644315 0.646757 0.635991 0.626387 0.644692 0.628172 0.641113 0.643251 0.636049 0.640025 0.644344 0.641474 0.635523 0.634962 0.858392 0.865691 0.869395 0.867248 0.887618 0.871093 0.871987 0.869243 0.865247 0.867683 0.868579 0.873182 -0.410973 0.895476
total_amt', Period2022-06_M 0.408367 0.716978 0.700699 0.720922 0.720047 0.731702 0.720356 0.723619 0.716172 0.723171 0.716529 0.722863 0.734642 0.714429 0.719419 0.715960 0.672347 0.652644 0.645636 0.662911 0.656614 0.686581 0.643810 0.653662 0.682345 0.649472 0.602978 0.677890 0.617552 0.652936 0.666822 0.675242 0.623365 0.633848 0.659179 0.632374 0.690018 0.855823 0.858006 0.847539 0.862011 0.869486 1.000000 0.857527 0.854271 0.861619 0.833054 0.808981 0.370465 0.663735 0.658202 0.666454 0.667778 0.669674 0.667593 0.672743 0.663650 0.668631 0.665545 0.677579 0.675265 0.673515 0.675140 0.665131 0.663296 0.668375 0.677529 0.671868 0.671867 0.683249 0.656333 0.674232 0.675864 0.666366 0.651256 0.679262 0.659500 0.675735 0.680391 0.673694 0.671988 0.675514 0.672309 0.670243 0.668539 0.864814 0.870410 0.873804 0.871857 0.871964 0.892583 0.871604 0.869112 0.868073 0.879390 0.874152 0.873739 -0.394793 0.873737
total_amt', Period2022-07_M 0.437908 0.706839 0.696530 0.711025 0.721130 0.715696 0.714604 0.726857 0.715725 0.719929 0.714106 0.722330 0.732053 0.707699 0.716416 0.714486 0.657171 0.645686 0.629752 0.666026 0.652862 0.654940 0.642746 0.637574 0.666876 0.657766 0.605969 0.663152 0.614708 0.644667 0.670441 0.674190 0.616294 0.623896 0.656405 0.634487 0.693509 0.857600 0.859814 0.848856 0.869383 0.873433 0.857527 1.000000 0.860964 0.852953 0.845685 0.799701 0.401852 0.646107 0.651974 0.654080 0.659429 0.653672 0.657059 0.667579 0.656354 0.657422 0.659840 0.666325 0.665693 0.659552 0.662461 0.655458 0.654970 0.662050 0.673223 0.663065 0.664802 0.670693 0.645587 0.665987 0.667774 0.670563 0.649074 0.666764 0.654658 0.672850 0.673810 0.661167 0.659957 0.669148 0.663814 0.657405 0.667213 0.859580 0.869389 0.867990 0.867408 0.870923 0.869245 0.890620 0.869859 0.864652 0.875819 0.863439 0.873687 -0.395597 0.877524
total_amt', Period2022-08_M 0.399066 0.712005 0.703610 0.717676 0.720828 0.720976 0.719936 0.726148 0.715935 0.712678 0.716521 0.721674 0.730632 0.707351 0.710895 0.723762 0.667004 0.659361 0.619612 0.652286 0.657128 0.666492 0.630037 0.640429 0.672354 0.655386 0.619095 0.666305 0.634151 0.659245 0.682974 0.681045 0.632242 0.640626 0.665522 0.639084 0.696125 0.858190 0.856983 0.852346 0.858111 0.879291 0.854271 0.860964 1.000000 0.857891 0.837032 0.808178 0.376166 0.665294 0.665598 0.673192 0.669788 0.669503 0.674201 0.683420 0.670033 0.667713 0.667969 0.682814 0.680643 0.676958 0.674987 0.673833 0.668325 0.674031 0.678393 0.673480 0.675484 0.685453 0.663608 0.676948 0.681382 0.673372 0.660545 0.676221 0.669028 0.685333 0.681682 0.678806 0.677184 0.683037 0.678189 0.678653 0.676513 0.873827 0.881946 0.883200 0.885347 0.881052 0.887219 0.885121 0.906348 0.879271 0.885680 0.884814 0.888905 -0.404256 0.887157
total_amt', Period2022-09_M 0.419300 0.723632 0.710236 0.725661 0.729735 0.726998 0.725851 0.730342 0.725794 0.726541 0.719490 0.722827 0.740601 0.724894 0.717045 0.734158 0.674628 0.657596 0.638981 0.662503 0.659793 0.664407 0.639901 0.644054 0.660114 0.672920 0.608305 0.667574 0.627048 0.637699 0.668824 0.680159 0.638999 0.640622 0.682014 0.641317 0.697670 0.865026 0.879038 0.858093 0.868710 0.875752 0.861619 0.852953 0.857891 1.000000 0.833670 0.817470 0.378093 0.650808 0.651992 0.655440 0.661375 0.657373 0.657397 0.660513 0.654826 0.657853 0.657038 0.662240 0.664108 0.668220 0.662093 0.660239 0.653223 0.655469 0.669629 0.662167 0.665591 0.667853 0.647007 0.660919 0.665501 0.667436 0.643732 0.663677 0.653002 0.665010 0.666665 0.664568 0.661227 0.670442 0.664846 0.658287 0.658597 0.853545 0.863684 0.859313 0.860971 0.859493 0.868626 0.863077 0.865410 0.881175 0.859560 0.863772 0.866008 -0.402856 0.878777
total_amt', Period2022-10_M 0.393458 0.685961 0.670724 0.692241 0.703767 0.699281 0.694761 0.707365 0.694738 0.698218 0.690752 0.701690 0.710046 0.688702 0.691949 0.702185 0.646981 0.637444 0.614928 0.640796 0.632172 0.635497 0.625242 0.643198 0.648611 0.638459 0.602918 0.653069 0.596953 0.635143 0.651563 0.641775 0.595940 0.614235 0.642610 0.611075 0.671312 0.851335 0.845883 0.839957 0.848555 0.854501 0.833054 0.845685 0.837032 0.833670 1.000000 0.795307 0.355840 0.626397 0.620035 0.629361 0.634533 0.633480 0.626954 0.642347 0.629003 0.631144 0.631191 0.645508 0.638280 0.639239 0.635537 0.630193 0.632313 0.634111 0.645963 0.638019 0.634964 0.646139 0.621966 0.643391 0.645853 0.639584 0.621754 0.640763 0.624286 0.646771 0.647751 0.634153 0.634086 0.642370 0.642102 0.637462 0.635385 0.837831 0.840799 0.843504 0.844079 0.841560 0.846076 0.845524 0.844315 0.841544 0.870270 0.842455 0.840952 -0.386815 0.855533
total_amt', Period2022-11_M 0.405663 0.683284 0.671849 0.684701 0.687573 0.687091 0.691536 0.697302 0.690413 0.684080 0.684669 0.689194 0.704329 0.687473 0.680069 0.693752 0.627969 0.613387 0.605353 0.640435 0.621899 0.635092 0.609468 0.607116 0.640331 0.621594 0.587923 0.645942 0.606619 0.609416 0.629076 0.643938 0.598759 0.599072 0.625860 0.609270 0.658042 0.823180 0.824174 0.826637 0.818492 0.831472 0.808981 0.799701 0.808178 0.817470 0.795307 1.000000 0.379678 0.620293 0.618865 0.623068 0.624223 0.623864 0.626933 0.632336 0.621729 0.622780 0.629101 0.633275 0.632905 0.635456 0.631277 0.625703 0.620448 0.621132 0.638052 0.628793 0.630334 0.638941 0.610827 0.626596 0.639989 0.633427 0.618596 0.634508 0.630324 0.632335 0.635841 0.627578 0.630652 0.630006 0.633811 0.625715 0.629749 0.812652 0.816538 0.829292 0.813356 0.822461 0.824874 0.819645 0.825852 0.818578 0.821202 0.847240 0.822226 -0.398418 0.831043
trans_count', Period2018-12_M 0.758131 0.469361 0.462963 0.468197 0.466034 0.465663 0.468306 0.478752 0.467535 0.484111 0.482206 0.467490 0.472337 0.449970 0.469047 0.464206 0.485464 0.479081 0.472909 0.510456 0.501250 0.490791 0.485507 0.482933 0.513899 0.498540 0.453495 0.497957 0.492438 0.477867 0.492030 0.496206 0.484118 0.501903 0.483651 0.462200 0.498203 0.368353 0.350528 0.337983 0.348927 0.381591 0.370465 0.401852 0.376166 0.378093 0.355840 0.379678 1.000000 0.532151 0.533577 0.541968 0.534982 0.530538 0.531755 0.542062 0.532554 0.546454 0.541983 0.536613 0.538243 0.535485 0.535265 0.534837 0.539324 0.527816 0.543886 0.545793 0.542361 0.542939 0.533050 0.527431 0.547133 0.540361 0.529033 0.546873 0.539859 0.532083 0.544141 0.543936 0.541859 0.533599 0.541509 0.533113 0.547174 0.431212 0.423790 0.410006 0.408992 0.412935 0.416941 0.421444 0.417321 0.416938 0.409012 0.412859 0.414490 -0.188891 0.399576
trans_count', Period2019-01_M 0.452343 0.867305 0.821353 0.835327 0.835619 0.841041 0.840378 0.841287 0.843100 0.834109 0.839741 0.822596 0.838494 0.813013 0.805934 0.829202 0.850917 0.864858 0.831444 0.862340 0.867221 0.834040 0.847965 0.827033 0.885136 0.826927 0.810037 0.848762 0.833457 0.821174 0.867395 0.858029 0.803068 0.796010 0.839599 0.797547 0.882349 0.639137 0.594229 0.578121 0.602247 0.631492 0.663735 0.646107 0.665294 0.650808 0.626397 0.620293 0.532151 1.000000 0.946259 0.956943 0.954025 0.959670 0.961990 0.962040 0.963675 0.955838 0.959507 0.959950 0.967187 0.948273 0.952335 0.959841 0.958016 0.958585 0.963606 0.960584 0.962100 0.956810 0.957884 0.955395 0.965385 0.951658 0.952330 0.959168 0.956031 0.954068 0.958808 0.958951 0.960164 0.954914 0.956624 0.957738 0.964976 0.775488 0.735984 0.731423 0.733094 0.746574 0.746206 0.741239 0.742995 0.740085 0.741331 0.732299 0.744807 -0.347326 0.692425
trans_count', Period2019-02_M 0.462406 0.827236 0.870161 0.837044 0.840877 0.842734 0.838535 0.841762 0.840316 0.839286 0.842873 0.822838 0.837416 0.818792 0.812655 0.823799 0.849549 0.864501 0.823604 0.862528 0.862084 0.832155 0.845926 0.823170 0.884840 0.822855 0.811205 0.853107 0.832401 0.820285 0.872866 0.857599 0.815242 0.793907 0.844295 0.802583 0.887027 0.632641 0.600171 0.581321 0.601175 0.628957 0.658202 0.651974 0.665598 0.651992 0.620035 0.618865 0.533577 0.946259 1.000000 0.953475 0.954259 0.955048 0.956958 0.958207 0.957097 0.956091 0.957601 0.953654 0.962462 0.949588 0.946526 0.954489 0.956433 0.957202 0.957761 0.958784 0.958082 0.950765 0.952482 0.955189 0.964405 0.949921 0.948162 0.953344 0.951654 0.953512 0.956888 0.957682 0.959193 0.953113 0.954428 0.955970 0.962359 0.769627 0.732719 0.725674 0.729947 0.739931 0.737556 0.736602 0.735203 0.734834 0.733239 0.727435 0.737034 -0.343110 0.687360
trans_count', Period2019-03_M 0.458912 0.834492 0.830100 0.870714 0.843371 0.847101 0.846845 0.848015 0.848133 0.843196 0.847223 0.830398 0.840928 0.813338 0.808823 0.825855 0.852736 0.867290 0.827087 0.872208 0.872678 0.843911 0.852320 0.839549 0.884587 0.828691 0.816737 0.853374 0.838989 0.834354 0.875990 0.856516 0.814943 0.807301 0.848654 0.797377 0.891025 0.647443 0.604270 0.585532 0.609924 0.632642 0.666454 0.654080 0.673192 0.655440 0.629361 0.623068 0.541968 0.956943 0.953475 1.000000 0.962186 0.966485 0.969900 0.967576 0.970505 0.965686 0.968131 0.965925 0.973813 0.959636 0.956855 0.964288 0.967789 0.968945 0.967260 0.967740 0.969285 0.964522 0.963488 0.964573 0.971300 0.957540 0.956810 0.965124 0.963648 0.966080 0.967744 0.966647 0.967415 0.963395 0.965989 0.963370 0.973897 0.780958 0.741742 0.736975 0.741081 0.749054 0.750041 0.745717 0.747340 0.743815 0.744029 0.739627 0.749861 -0.357593 0.696378
trans_count', Period2019-04_M 0.468279 0.837639 0.834151 0.844074 0.874956 0.849373 0.845405 0.847243 0.846596 0.840499 0.846124 0.831155 0.842828 0.817815 0.812738 0.831412 0.860531 0.869178 0.832778 0.872781 0.871187 0.842371 0.853066 0.838775 0.890986 0.828320 0.817614 0.854308 0.844775 0.829801 0.879482 0.858813 0.813133 0.795793 0.848252 0.796717 0.890029 0.648542 0.610641 0.586439 0.616415 0.635399 0.667778 0.659429 0.669788 0.661375 0.634533 0.624223 0.534982 0.954025 0.954259 0.962186 1.000000 0.965587 0.966684 0.965855 0.966840 0.960908 0.966326 0.963155 0.971309 0.955482 0.955960 0.964520 0.964024 0.963009 0.966872 0.965654 0.966105 0.960219 0.961558 0.962062 0.971439 0.956711 0.953235 0.961115 0.962026 0.962135 0.964745 0.963313 0.965198 0.960395 0.962532 0.959376 0.968034 0.780228 0.745064 0.738785 0.740690 0.753052 0.750198 0.747987 0.747760 0.746225 0.747917 0.741938 0.750305 -0.347712 0.698029
trans_count', Period2019-05_M 0.454956 0.832431 0.828582 0.842022 0.845523 0.873533 0.844163 0.845775 0.845386 0.841219 0.846933 0.828212 0.840135 0.813124 0.811102 0.826711 0.851263 0.863429 0.835224 0.860464 0.869819 0.845875 0.856400 0.836375 0.887294 0.828446 0.809714 0.848513 0.840916 0.825002 0.873069 0.859492 0.812509 0.800381 0.849656 0.798538 0.890360 0.644262 0.607653 0.586210 0.614099 0.633001 0.669674 0.653672 0.669503 0.657373 0.633480 0.623864 0.530538 0.959670 0.955048 0.966485 0.965587 1.000000 0.968830 0.969324 0.970262 0.966035 0.968919 0.965026 0.975987 0.957584 0.957696 0.963518 0.965749 0.967295 0.967208 0.965886 0.970198 0.965274 0.966138 0.963385 0.972590 0.958093 0.957228 0.961153 0.962342 0.965299 0.966790 0.967326 0.967738 0.963388 0.965503 0.964357 0.971579 0.783582 0.746225 0.740327 0.743999 0.754230 0.752125 0.750236 0.750630 0.747060 0.747656 0.740346 0.752428 -0.351065 0.699534
trans_count', Period2019-06_M 0.462828 0.842734 0.838449 0.852381 0.852257 0.854790 0.874446 0.857282 0.857034 0.849991 0.854744 0.839350 0.850908 0.825166 0.819685 0.834914 0.862802 0.874092 0.834572 0.875285 0.874352 0.850917 0.857115 0.845160 0.891971 0.831176 0.819773 0.860435 0.843816 0.835081 0.880029 0.866968 0.818008 0.812641 0.845425 0.801744 0.897324 0.649411 0.608878 0.589333 0.616432 0.635399 0.667593 0.657059 0.674201 0.657397 0.626954 0.626933 0.531755 0.961990 0.956958 0.969900 0.966684 0.968830 1.000000 0.972443 0.973355 0.968201 0.969121 0.970365 0.976758 0.959636 0.959296 0.967160 0.968969 0.968955 0.972159 0.969935 0.970373 0.966377 0.966962 0.966314 0.974296 0.957975 0.958327 0.966465 0.961785 0.965365 0.968006 0.968356 0.970575 0.964871 0.967798 0.964403 0.973892 0.779776 0.742189 0.737970 0.740574 0.751734 0.750194 0.745826 0.747912 0.744242 0.744891 0.739755 0.748385 -0.354487 0.697402
trans_count', Period2019-07_M 0.464256 0.842286 0.836692 0.850870 0.850693 0.853106 0.852801 0.876209 0.853536 0.847656 0.852597 0.838255 0.849921 0.820867 0.819231 0.836654 0.860017 0.876971 0.832209 0.867922 0.878364 0.850918 0.862387 0.843069 0.892698 0.833155 0.819775 0.858640 0.844805 0.831893 0.879515 0.867560 0.814901 0.807560 0.852940 0.803974 0.894151 0.652859 0.612684 0.595487 0.624001 0.644979 0.672743 0.667579 0.683420 0.660513 0.642347 0.632336 0.542062 0.962040 0.958207 0.967576 0.965855 0.969324 0.972443 1.000000 0.970268 0.967095 0.968110 0.968919 0.977139 0.960888 0.959602 0.969290 0.968491 0.970831 0.971742 0.970058 0.970083 0.967152 0.967789 0.968827 0.974754 0.958479 0.959338 0.966234 0.964588 0.967582 0.968223 0.968941 0.969618 0.964286 0.968400 0.963448 0.974856 0.787370 0.752561 0.745791 0.749167 0.759328 0.758590 0.754874 0.755431 0.752706 0.754414 0.744880 0.757298 -0.352189 0.706792
trans_count', Period2019-08_M 0.456326 0.843911 0.833723 0.849431 0.849340 0.854924 0.851066 0.850850 0.872691 0.848204 0.852617 0.836124 0.847840 0.822599 0.818147 0.834789 0.858073 0.869166 0.835806 0.873284 0.874050 0.846457 0.857710 0.835894 0.892099 0.831994 0.827147 0.857949 0.845663 0.834351 0.882775 0.868081 0.815709 0.803491 0.850754 0.800960 0.898391 0.647894 0.604810 0.580015 0.611890 0.632256 0.663650 0.656354 0.670033 0.654826 0.629003 0.621729 0.532554 0.963675 0.957097 0.970505 0.966840 0.970262 0.973355 0.970268 1.000000 0.970898 0.969888 0.970644 0.978208 0.964473 0.961022 0.969049 0.968013 0.969957 0.971117 0.971865 0.971139 0.969219 0.968834 0.968869 0.976053 0.960800 0.961154 0.967372 0.965271 0.968344 0.970177 0.971664 0.970743 0.967188 0.967040 0.964941 0.975982 0.782188 0.741296 0.734382 0.740281 0.749740 0.747682 0.746185 0.745657 0.743632 0.744906 0.737392 0.746790 -0.349196 0.694985
trans_count', Period2019-09_M 0.470820 0.836527 0.835141 0.848160 0.846332 0.850055 0.848782 0.850547 0.852616 0.872030 0.852242 0.834853 0.844658 0.819227 0.818629 0.831284 0.855195 0.866643 0.834563 0.870173 0.868886 0.848898 0.863365 0.843875 0.889714 0.836803 0.818336 0.854747 0.845130 0.830942 0.875472 0.864561 0.813548 0.802012 0.851162 0.802815 0.891175 0.648796 0.607961 0.585248 0.610647 0.634265 0.668631 0.657422 0.667713 0.657853 0.631144 0.622780 0.546454 0.955838 0.956091 0.965686 0.960908 0.966035 0.968201 0.967095 0.970898 1.000000 0.967591 0.963422 0.971991 0.958227 0.960281 0.965292 0.963815 0.964819 0.965588 0.967853 0.967188 0.966818 0.964603 0.963260 0.972173 0.959070 0.953509 0.962640 0.963302 0.962944 0.965442 0.969243 0.967643 0.961759 0.965315 0.963436 0.970957 0.781901 0.742125 0.735249 0.739738 0.747635 0.747517 0.745144 0.745429 0.743720 0.742906 0.736241 0.747488 -0.351742 0.699467
trans_count', Period2019-10_M 0.459396 0.843813 0.837570 0.852769 0.852477 0.856715 0.851791 0.853757 0.852823 0.851707 0.875611 0.835944 0.852335 0.823674 0.819330 0.834757 0.855024 0.865995 0.836785 0.872540 0.876637 0.846841 0.860262 0.838072 0.896240 0.831566 0.814980 0.852611 0.845672 0.823723 0.877032 0.862307 0.811621 0.800364 0.844691 0.805998 0.895028 0.649143 0.609502 0.590743 0.615798 0.637817 0.665545 0.659840 0.667969 0.657038 0.631191 0.629101 0.541983 0.959507 0.957601 0.968131 0.966326 0.968919 0.969121 0.968110 0.969888 0.967591 1.000000 0.966015 0.975437 0.963051 0.958820 0.966804 0.965408 0.966243 0.969435 0.968722 0.970144 0.966738 0.964837 0.965703 0.975214 0.956953 0.954982 0.965155 0.962992 0.962608 0.965642 0.966136 0.967799 0.961547 0.966302 0.963415 0.971664 0.780135 0.741882 0.739841 0.740984 0.752118 0.751737 0.747354 0.749050 0.743306 0.746829 0.741341 0.749673 -0.346931 0.698112
trans_count', Period2019-11_M 0.458764 0.847733 0.837294 0.854246 0.855461 0.857707 0.857041 0.859127 0.858899 0.850842 0.856160 0.867190 0.854351 0.824633 0.826910 0.839552 0.864256 0.873185 0.836454 0.875436 0.877776 0.850451 0.865578 0.842200 0.894014 0.826846 0.818779 0.858948 0.851205 0.832225 0.882089 0.868110 0.808976 0.813086 0.849273 0.804020 0.895872 0.656003 0.618120 0.591763 0.623392 0.643721 0.677579 0.666325 0.682814 0.662240 0.645508 0.633275 0.536613 0.959950 0.953654 0.965925 0.963155 0.965026 0.970365 0.968919 0.970644 0.963422 0.966015 1.000000 0.973961 0.956213 0.958580 0.965204 0.964218 0.966506 0.970872 0.968410 0.965734 0.965667 0.963942 0.964157 0.971979 0.956454 0.956478 0.966038 0.963665 0.962288 0.965855 0.968151 0.966881 0.962357 0.964773 0.962756 0.972521 0.784834 0.745240 0.740896 0.746418 0.753251 0.753602 0.749813 0.750523 0.744999 0.752355 0.741925 0.752143 -0.347030 0.703649
trans_count', Period2019-12_M 0.461434 0.845453 0.837470 0.852151 0.852981 0.858691 0.854831 0.857003 0.856632 0.849773 0.855043 0.841385 0.864487 0.824816 0.823534 0.835840 0.855792 0.877885 0.837656 0.870269 0.877918 0.847964 0.859502 0.841384 0.896682 0.836362 0.819606 0.859462 0.845918 0.835571 0.883489 0.870231 0.813579 0.808460 0.852317 0.806325 0.897528 0.654720 0.612870 0.591939 0.621112 0.642331 0.675265 0.665693 0.680643 0.664108 0.638280 0.632905 0.538243 0.967187 0.962462 0.973813 0.971309 0.975987 0.976758 0.977139 0.978208 0.971991 0.975437 0.973961 1.000000 0.965714 0.966529 0.972510 0.971923 0.973151 0.976218 0.975463 0.978033 0.970852 0.973590 0.973751 0.980773 0.965804 0.965623 0.971334 0.970523 0.973583 0.975204 0.976127 0.975600 0.971847 0.972297 0.971675 0.979625 0.791597 0.752245 0.747135 0.752619 0.761707 0.761485 0.757786 0.759768 0.756140 0.757085 0.749638 0.760988 -0.365127 0.703665
trans_count', Period2020-01_M 0.470286 0.835428 0.841366 0.848167 0.850313 0.853196 0.847633 0.851245 0.854692 0.845091 0.854185 0.833898 0.849414 0.856438 0.821145 0.832837 0.854218 0.869979 0.844153 0.870584 0.876997 0.851757 0.861607 0.839261 0.891403 0.833559 0.817766 0.861288 0.844403 0.829002 0.885047 0.865490 0.827901 0.809180 0.852969 0.797452 0.892282 0.653893 0.613016 0.595720 0.627100 0.645360 0.673515 0.659552 0.676958 0.668220 0.639239 0.635456 0.535485 0.948273 0.949588 0.959636 0.955482 0.957584 0.959636 0.960888 0.964473 0.958227 0.963051 0.956213 0.965714 1.000000 0.950697 0.957563 0.957966 0.958872 0.964039 0.961017 0.962486 0.959116 0.958571 0.959382 0.964691 0.952411 0.949695 0.958699 0.955954 0.954790 0.963460 0.960988 0.962955 0.956199 0.962035 0.954551 0.967092 0.780749 0.743570 0.736706 0.740364 0.749660 0.747283 0.746232 0.747268 0.744572 0.744963 0.736640 0.747145 -0.346319 0.703251
trans_count', Period2020-02_M 0.475649 0.840678 0.830363 0.844247 0.844559 0.848352 0.848464 0.848025 0.850126 0.843241 0.847986 0.836569 0.845170 0.818418 0.859851 0.833949 0.858991 0.873607 0.831936 0.871446 0.865883 0.846598 0.856971 0.832610 0.883090 0.828894 0.811999 0.855663 0.845624 0.824469 0.878974 0.860657 0.809988 0.801170 0.843693 0.803746 0.891653 0.654266 0.612110 0.591221 0.620218 0.638852 0.675140 0.662461 0.674987 0.662093 0.635537 0.631277 0.535265 0.952335 0.946526 0.956855 0.955960 0.957696 0.959296 0.959602 0.961022 0.960281 0.958820 0.958580 0.966529 0.950697 1.000000 0.957922 0.959820 0.962377 0.962160 0.960021 0.960126 0.958401 0.956198 0.956092 0.964238 0.951608 0.951270 0.962043 0.958085 0.961246 0.963273 0.961986 0.962092 0.956747 0.958393 0.959116 0.965633 0.778878 0.743161 0.737095 0.741507 0.749530 0.751023 0.744692 0.745271 0.745141 0.748538 0.738885 0.748008 -0.339621 0.696683
trans_count', Period2020-03_M 0.457420 0.838266 0.831369 0.847701 0.850761 0.849106 0.846618 0.852657 0.850201 0.842754 0.848301 0.833486 0.846034 0.819095 0.816803 0.859817 0.855590 0.873574 0.835972 0.867747 0.875042 0.847050 0.853675 0.840949 0.893379 0.836179 0.824670 0.858192 0.846180 0.834639 0.886203 0.864568 0.813509 0.806146 0.847782 0.803126 0.893022 0.647110 0.605830 0.587687 0.611226 0.638075 0.665131 0.655458 0.673833 0.660239 0.630193 0.625703 0.534837 0.959841 0.954489 0.964288 0.964520 0.963518 0.967160 0.969290 0.969049 0.965292 0.966804 0.965204 0.972510 0.957563 0.957922 1.000000 0.966154 0.966344 0.968227 0.966456 0.968157 0.963635 0.963901 0.963117 0.972424 0.958580 0.956884 0.964395 0.964118 0.964176 0.968115 0.966779 0.970027 0.961727 0.962930 0.964579 0.972924 0.778698 0.739112 0.733378 0.736065 0.747440 0.744063 0.741180 0.744439 0.740225 0.741849 0.735997 0.745461 -0.358257 0.698150
trans_count', Period2020-04_M 0.453956 0.831728 0.826092 0.845664 0.842492 0.845087 0.842380 0.844680 0.845209 0.837955 0.841783 0.827109 0.841669 0.812018 0.817136 0.829212 0.888219 0.878455 0.835339 0.872674 0.871847 0.849668 0.858267 0.834632 0.893720 0.832967 0.817253 0.860176 0.846650 0.839882 0.884154 0.864139 0.808143 0.801620 0.853793 0.806219 0.887143 0.643662 0.598896 0.582942 0.611938 0.633727 0.663296 0.654970 0.668325 0.653223 0.632313 0.620448 0.539324 0.958016 0.956433 0.967789 0.964024 0.965749 0.968969 0.968491 0.968013 0.963815 0.965408 0.964218 0.971923 0.957966 0.959820 0.966154 1.000000 0.970217 0.969057 0.969617 0.967957 0.963886 0.964523 0.964749 0.973288 0.956744 0.959650 0.965385 0.964611 0.967384 0.967328 0.966789 0.970366 0.962557 0.967412 0.964423 0.972256 0.776766 0.736384 0.730123 0.737487 0.748209 0.745056 0.740429 0.741835 0.738688 0.741348 0.735272 0.743318 -0.338082 0.695663
trans_count', Period2020-05_M 0.449220 0.832231 0.831807 0.844811 0.843618 0.846251 0.845728 0.847354 0.845661 0.840443 0.844232 0.829316 0.841110 0.815993 0.817325 0.829420 0.856457 0.893951 0.832715 0.870352 0.872896 0.842475 0.856794 0.838791 0.886409 0.834465 0.813023 0.854299 0.850693 0.835133 0.885528 0.868226 0.810175 0.812169 0.854646 0.802144 0.891476 0.647402 0.607639 0.586049 0.615337 0.635794 0.668375 0.662050 0.674031 0.655469 0.634111 0.621132 0.527816 0.958585 0.957202 0.968945 0.963009 0.967295 0.968955 0.970831 0.969957 0.964819 0.966243 0.966506 0.973151 0.958872 0.962377 0.966344 0.970217 1.000000 0.971519 0.968060 0.969905 0.963103 0.969427 0.965258 0.973261 0.962352 0.958761 0.963938 0.967475 0.967042 0.972031 0.971233 0.970944 0.964507 0.968716 0.965942 0.976048 0.784279 0.745808 0.740189 0.744327 0.754730 0.753696 0.749451 0.747893 0.747222 0.749482 0.740257 0.753715 -0.365077 0.704129
trans_count', Period2020-06_M 0.464791 0.843920 0.837182 0.850482 0.851377 0.854238 0.856488 0.857067 0.856239 0.848818 0.853699 0.838042 0.853370 0.829243 0.825514 0.840067 0.863245 0.877946 0.866434 0.878817 0.878615 0.854636 0.862192 0.849870 0.891179 0.842782 0.820866 0.862440 0.851523 0.841168 0.889857 0.869002 0.821607 0.810846 0.858662 0.806326 0.901088 0.657238 0.614022 0.594395 0.622076 0.646584 0.677529 0.673223 0.678393 0.669629 0.645963 0.638052 0.543886 0.963606 0.957761 0.967260 0.966872 0.967208 0.972159 0.971742 0.971117 0.965588 0.969435 0.970872 0.976218 0.964039 0.962160 0.968227 0.969057 0.971519 1.000000 0.972788 0.971405 0.968057 0.967551 0.968817 0.974926 0.963245 0.961454 0.968345 0.968492 0.966830 0.973206 0.970131 0.971464 0.966081 0.970614 0.966338 0.976460 0.785753 0.748882 0.743980 0.747076 0.759274 0.758108 0.753794 0.754195 0.750486 0.755922 0.745867 0.757086 -0.348231 0.710393
trans_count', Period2020-07_M 0.470284 0.840544 0.835129 0.848804 0.849786 0.853096 0.851485 0.853267 0.853515 0.848264 0.850074 0.837108 0.850083 0.820633 0.823889 0.832099 0.861838 0.871269 0.842639 0.901283 0.877998 0.851212 0.864724 0.843759 0.893655 0.835311 0.824153 0.858998 0.846847 0.833140 0.883911 0.870599 0.817686 0.805930 0.855844 0.803352 0.894689 0.650441 0.614259 0.588250 0.617659 0.636987 0.671868 0.663065 0.673480 0.662167 0.638019 0.628793 0.545793 0.960584 0.958784 0.967740 0.965654 0.965886 0.969935 0.970058 0.971865 0.967853 0.968722 0.968410 0.975463 0.961017 0.960021 0.966456 0.969617 0.968060 0.972788 1.000000 0.971844 0.966475 0.967544 0.966975 0.975769 0.960589 0.960040 0.967478 0.964943 0.966433 0.971137 0.969552 0.971748 0.964427 0.968909 0.966044 0.976628 0.782775 0.742504 0.737320 0.742025 0.751692 0.750248 0.744893 0.746855 0.744123 0.749437 0.739283 0.748059 -0.343487 0.703322
trans_count', Period2020-08_M 0.468283 0.832978 0.827187 0.844382 0.844117 0.846604 0.844402 0.845227 0.845346 0.842051 0.845952 0.827290 0.841418 0.814795 0.813063 0.827833 0.855086 0.867331 0.837935 0.867992 0.900458 0.845916 0.856583 0.839161 0.890150 0.840271 0.814214 0.854820 0.849836 0.829678 0.882278 0.865589 0.815161 0.800896 0.850594 0.797764 0.895363 0.655151 0.612405 0.593187 0.617275 0.642210 0.671867 0.664802 0.675484 0.665591 0.634964 0.630334 0.542361 0.962100 0.958082 0.969285 0.966105 0.970198 0.970373 0.970083 0.971139 0.967188 0.970144 0.965734 0.978033 0.962486 0.960126 0.968157 0.967957 0.969905 0.971405 0.971844 1.000000 0.968423 0.966864 0.968526 0.976906 0.966084 0.960967 0.968531 0.968172 0.966728 0.971284 0.971935 0.972227 0.968995 0.968149 0.965690 0.976594 0.790424 0.749905 0.746597 0.748057 0.760513 0.756870 0.754907 0.755740 0.751406 0.753604 0.747308 0.756881 -0.367185 0.703759
trans_count', Period2020-09_M 0.471719 0.843401 0.833601 0.853677 0.852200 0.858116 0.851387 0.856910 0.856603 0.853002 0.857104 0.843653 0.852952 0.827514 0.823819 0.839180 0.860032 0.866935 0.832636 0.872829 0.879377 0.882716 0.866177 0.849412 0.896164 0.837671 0.820814 0.864524 0.848851 0.826085 0.885690 0.873158 0.819689 0.809093 0.854147 0.800672 0.896496 0.661767 0.623802 0.597142 0.628607 0.654656 0.683249 0.670693 0.685453 0.667853 0.646139 0.638941 0.542939 0.956810 0.950765 0.964522 0.960219 0.965274 0.966377 0.967152 0.969219 0.966818 0.966738 0.965667 0.970852 0.959116 0.958401 0.963635 0.963886 0.963103 0.968057 0.966475 0.968423 1.000000 0.963245 0.965956 0.971939 0.957493 0.957013 0.967219 0.966902 0.963072 0.969581 0.969082 0.968515 0.964626 0.965520 0.961819 0.973261 0.787757 0.749815 0.745030 0.748167 0.759267 0.756972 0.755099 0.753857 0.749845 0.753261 0.746290 0.755251 -0.357035 0.707639
trans_count', Period2020-10_M 0.454737 0.831950 0.825370 0.837464 0.839617 0.844983 0.841978 0.842750 0.841874 0.839597 0.841540 0.824964 0.838939 0.811494 0.808763 0.823068 0.853030 0.865063 0.831023 0.864773 0.867329 0.841136 0.891589 0.830690 0.891328 0.827259 0.811606 0.848891 0.841230 0.828289 0.878975 0.867138 0.808631 0.812074 0.851381 0.801389 0.888408 0.636302 0.594838 0.574834 0.602951 0.620289 0.656333 0.645587 0.663608 0.647007 0.621966 0.610827 0.533050 0.957884 0.952482 0.963488 0.961558 0.966138 0.966962 0.967789 0.968834 0.964603 0.964837 0.963942 0.973590 0.958571 0.956198 0.963901 0.964523 0.969427 0.967551 0.967544 0.966864 0.963245 1.000000 0.962026 0.972111 0.957642 0.958227 0.964119 0.964959 0.962829 0.967311 0.969434 0.968051 0.964721 0.965267 0.964728 0.972266 0.771896 0.732712 0.728545 0.734611 0.741723 0.741505 0.737114 0.739946 0.736533 0.736975 0.730454 0.741615 -0.348091 0.688113
trans_count', Period2020-11_M 0.459383 0.838680 0.834284 0.849299 0.848311 0.852801 0.850541 0.854140 0.855877 0.845751 0.851632 0.835540 0.850622 0.822874 0.819418 0.831793 0.859128 0.874658 0.836407 0.871239 0.877149 0.843593 0.856815 0.876984 0.893927 0.834488 0.815329 0.855297 0.847128 0.835415 0.883382 0.869573 0.812972 0.804639 0.856394 0.800262 0.896863 0.654463 0.614687 0.591694 0.625915 0.644315 0.674232 0.665987 0.676948 0.660919 0.643391 0.626596 0.527431 0.955395 0.955189 0.964573 0.962062 0.963385 0.966314 0.968827 0.968869 0.963260 0.965703 0.964157 0.973751 0.959382 0.956092 0.963117 0.964749 0.965258 0.968817 0.966975 0.968526 0.965956 0.962026 1.000000 0.971427 0.957782 0.956579 0.964056 0.962994 0.963997 0.967259 0.965533 0.967466 0.962411 0.966568 0.962174 0.973341 0.786643 0.745893 0.739723 0.745997 0.756297 0.752521 0.751304 0.749183 0.745365 0.751414 0.740298 0.752085 -0.350098 0.704729
trans_count', Period2020-12_M 0.472859 0.850894 0.843543 0.856097 0.857726 0.861938 0.858753 0.860508 0.860508 0.854474 0.859154 0.844236 0.858225 0.830157 0.829180 0.846524 0.866113 0.877283 0.843585 0.876992 0.880929 0.857063 0.863899 0.844049 0.920369 0.840067 0.823971 0.862997 0.853788 0.838782 0.890442 0.877406 0.821615 0.809342 0.860173 0.809490 0.902532 0.657859 0.617508 0.595496 0.625993 0.646757 0.675864 0.667774 0.681382 0.665501 0.645853 0.639989 0.547133 0.965385 0.964405 0.971300 0.971439 0.972590 0.974296 0.974754 0.976053 0.972173 0.975214 0.971979 0.980773 0.964691 0.964238 0.972424 0.973288 0.973261 0.974926 0.975769 0.976906 0.971939 0.972111 0.971427 1.000000 0.964265 0.964163 0.973479 0.970136 0.973060 0.975599 0.975636 0.974635 0.970511 0.973178 0.969568 0.980249 0.789291 0.749243 0.744226 0.749153 0.759428 0.756500 0.753295 0.755440 0.750447 0.754041 0.746136 0.756286 -0.365081 0.708231
trans_count', Period2021-01_M 0.462823 0.826641 0.821560 0.829909 0.836577 0.835303 0.833486 0.833541 0.834926 0.833616 0.833951 0.819112 0.830021 0.809665 0.808745 0.822310 0.850791 0.868204 0.833885 0.863269 0.870218 0.836720 0.847512 0.834519 0.879853 0.875907 0.810237 0.853340 0.841850 0.825547 0.879342 0.860408 0.812190 0.800130 0.851020 0.801373 0.889742 0.654801 0.606810 0.587042 0.617008 0.635991 0.666366 0.670563 0.673372 0.667436 0.639584 0.633427 0.540361 0.951658 0.949921 0.957540 0.956711 0.958093 0.957975 0.958479 0.960800 0.959070 0.956953 0.956454 0.965804 0.952411 0.951608 0.958580 0.956744 0.962352 0.963245 0.960589 0.966084 0.957493 0.957642 0.957782 0.964265 1.000000 0.953376 0.959565 0.957850 0.959772 0.966900 0.964613 0.962614 0.958987 0.961808 0.958380 0.970356 0.787025 0.746139 0.743819 0.747192 0.753897 0.753546 0.754962 0.750537 0.748493 0.752365 0.744249 0.753412 -0.359618 0.701070
trans_count', Period2021-02_M 0.441932 0.825560 0.816118 0.829020 0.830156 0.836149 0.832801 0.834782 0.836035 0.824318 0.831182 0.818102 0.830879 0.803133 0.809038 0.818005 0.854158 0.858249 0.825870 0.860332 0.867196 0.830926 0.853544 0.824016 0.882983 0.834174 0.857156 0.850078 0.841145 0.829252 0.879574 0.857013 0.802824 0.798934 0.839201 0.800263 0.886847 0.633372 0.590488 0.568729 0.598934 0.626387 0.651256 0.649074 0.660545 0.643732 0.621754 0.618596 0.529033 0.952330 0.948162 0.956810 0.953235 0.957228 0.958327 0.959338 0.961154 0.953509 0.954982 0.956478 0.965623 0.949695 0.951270 0.956884 0.959650 0.958761 0.961454 0.960040 0.960967 0.957013 0.958227 0.956579 0.964163 0.953376 1.000000 0.959339 0.959525 0.958106 0.962248 0.960295 0.960365 0.957744 0.957385 0.956048 0.967637 0.769589 0.729400 0.722420 0.727557 0.736315 0.735138 0.733062 0.735412 0.730610 0.732912 0.724915 0.735258 -0.348739 0.681696
trans_count', Period2021-03_M 0.469830 0.838643 0.829301 0.845552 0.841997 0.846114 0.846650 0.846012 0.847280 0.840854 0.846880 0.831175 0.843387 0.820457 0.820969 0.831303 0.861158 0.863834 0.836477 0.872952 0.877207 0.854235 0.856093 0.845602 0.895722 0.835654 0.826597 0.887800 0.844547 0.833328 0.887600 0.869511 0.817210 0.808183 0.854586 0.804833 0.895471 0.652921 0.606438 0.590446 0.620035 0.644692 0.679262 0.666764 0.676221 0.663677 0.640763 0.634508 0.546873 0.959168 0.953344 0.965124 0.961115 0.961153 0.966465 0.966234 0.967372 0.962640 0.965155 0.966038 0.971334 0.958699 0.962043 0.964395 0.965385 0.963938 0.968345 0.967478 0.968531 0.967219 0.964119 0.964056 0.973479 0.959565 0.959339 1.000000 0.963888 0.967176 0.970081 0.968838 0.968311 0.965067 0.967836 0.962760 0.974210 0.786148 0.746810 0.743238 0.747384 0.755562 0.755217 0.751008 0.750280 0.748087 0.750889 0.741452 0.752810 -0.354280 0.704396
trans_count', Period2021-04_M 0.460774 0.828041 0.819529 0.834987 0.836232 0.837999 0.834347 0.837847 0.837990 0.833288 0.837649 0.824937 0.835539 0.811332 0.812729 0.822874 0.853805 0.870206 0.831869 0.866749 0.870659 0.844610 0.860606 0.836483 0.888803 0.838788 0.822809 0.854594 0.878431 0.825350 0.882491 0.867627 0.820722 0.800815 0.851739 0.803776 0.892859 0.643644 0.596235 0.573805 0.601691 0.628172 0.659500 0.654658 0.669028 0.653002 0.624286 0.630324 0.539859 0.956031 0.951654 0.963648 0.962026 0.962342 0.961785 0.964588 0.965271 0.963302 0.962992 0.963665 0.970523 0.955954 0.958085 0.964118 0.964611 0.967475 0.968492 0.964943 0.968172 0.966902 0.964959 0.962994 0.970136 0.957850 0.959525 0.963888 1.000000 0.964376 0.969328 0.969209 0.970394 0.966030 0.966264 0.963272 0.974138 0.779790 0.739221 0.734480 0.738094 0.747600 0.743250 0.741479 0.744841 0.739372 0.742116 0.736947 0.746653 -0.350138 0.696229
trans_count', Period2021-05_M 0.456129 0.834356 0.829209 0.847176 0.846324 0.852313 0.845410 0.850273 0.849375 0.842423 0.845420 0.832278 0.845257 0.815117 0.821374 0.835737 0.858827 0.869265 0.835499 0.866498 0.871851 0.843989 0.855702 0.838458 0.890501 0.836818 0.816305 0.858874 0.846598 0.864922 0.884063 0.868917 0.815768 0.803747 0.853495 0.806309 0.893501 0.653296 0.614028 0.598556 0.623964 0.641113 0.675735 0.672850 0.685333 0.665010 0.646771 0.632335 0.532083 0.954068 0.953512 0.966080 0.962135 0.965299 0.965365 0.967582 0.968344 0.962944 0.962608 0.962288 0.973583 0.954790 0.961246 0.964176 0.967384 0.967042 0.966830 0.966433 0.966728 0.963072 0.962829 0.963997 0.973060 0.959772 0.958106 0.967176 0.964376 1.000000 0.969670 0.968215 0.971048 0.964092 0.966678 0.964735 0.974387 0.787577 0.751805 0.744920 0.750310 0.757040 0.758855 0.755219 0.755059 0.753025 0.756473 0.746699 0.758888 -0.362948 0.708239
trans_count', Period2021-06_M 0.464774 0.837536 0.833878 0.846073 0.846305 0.847810 0.846228 0.848157 0.849608 0.842850 0.847209 0.833911 0.845331 0.819760 0.824557 0.834130 0.855394 0.869753 0.835679 0.871740 0.870259 0.849042 0.863612 0.847664 0.895285 0.843795 0.825359 0.862108 0.852480 0.835499 0.912472 0.872879 0.815288 0.811085 0.856025 0.800385 0.900457 0.659509 0.614823 0.593725 0.622822 0.643251 0.680391 0.673810 0.681682 0.666665 0.647751 0.635841 0.544141 0.958808 0.956888 0.967744 0.964745 0.966790 0.968006 0.968223 0.970177 0.965442 0.965642 0.965855 0.975204 0.963460 0.963273 0.968115 0.967328 0.972031 0.973206 0.971137 0.971284 0.969581 0.967311 0.967259 0.975599 0.966900 0.962248 0.970081 0.969328 0.969670 1.000000 0.971372 0.970706 0.970282 0.971336 0.968639 0.978535 0.790543 0.750392 0.746497 0.750463 0.758839 0.758136 0.756929 0.754762 0.749928 0.756802 0.746470 0.756788 -0.366696 0.709025
trans_count', Period2021-07_M 0.463455 0.833660 0.829774 0.839410 0.841950 0.846822 0.842755 0.844969 0.845519 0.841096 0.843144 0.829447 0.840239 0.815666 0.816931 0.826848 0.858764 0.872193 0.829580 0.865096 0.876273 0.853595 0.859499 0.839343 0.891745 0.838308 0.817188 0.857279 0.844497 0.830185 0.884522 0.889547 0.814668 0.809659 0.853465 0.805457 0.893634 0.654579 0.611080 0.589418 0.614901 0.636049 0.673694 0.661167 0.678806 0.664568 0.634153 0.627578 0.543936 0.958951 0.957682 0.966647 0.963313 0.967326 0.968356 0.968941 0.971664 0.969243 0.966136 0.968151 0.976127 0.960988 0.961986 0.966779 0.966789 0.971233 0.970131 0.969552 0.971935 0.969082 0.969434 0.965533 0.975636 0.964613 0.960295 0.968838 0.969209 0.968215 0.971372 1.000000 0.970665 0.968743 0.969643 0.966476 0.976197 0.790116 0.749233 0.746122 0.747367 0.756898 0.756881 0.754701 0.753685 0.752369 0.752960 0.744341 0.755379 -0.355006 0.704450
trans_count', Period2021-08_M 0.463294 0.835547 0.835839 0.845161 0.845408 0.849642 0.848757 0.848376 0.849981 0.844746 0.847269 0.833690 0.844223 0.818845 0.820168 0.833697 0.861111 0.881120 0.836382 0.874206 0.879403 0.853383 0.853258 0.842186 0.890474 0.838745 0.820443 0.857385 0.850550 0.841012 0.886666 0.867957 0.839336 0.813269 0.851245 0.808275 0.898360 0.651456 0.606073 0.588579 0.613598 0.640025 0.671988 0.659957 0.677184 0.661227 0.634086 0.630652 0.541859 0.960164 0.959193 0.967415 0.965198 0.967738 0.970575 0.969618 0.970743 0.967643 0.967799 0.966881 0.975600 0.962955 0.962092 0.970027 0.970366 0.970944 0.971464 0.971748 0.972227 0.968515 0.968051 0.967466 0.974635 0.962614 0.960365 0.968311 0.970394 0.971048 0.970706 0.970665 1.000000 0.969315 0.970057 0.966966 0.976751 0.782921 0.744314 0.739680 0.742245 0.752505 0.750216 0.747141 0.748458 0.745450 0.746564 0.741560 0.751305 -0.355743 0.701626
trans_count', Period2021-09_M 0.458722 0.837525 0.831459 0.849971 0.845724 0.849384 0.846544 0.849497 0.851840 0.842654 0.845240 0.833769 0.845563 0.821093 0.821949 0.831932 0.859388 0.875106 0.834678 0.870715 0.870947 0.848028 0.859555 0.836770 0.891990 0.842528 0.825624 0.855611 0.849588 0.829128 0.887948 0.871057 0.815995 0.835917 0.848588 0.809525 0.895177 0.658455 0.617068 0.599657 0.625401 0.644344 0.675514 0.669148 0.683037 0.670442 0.642370 0.630006 0.533599 0.954914 0.953113 0.963395 0.960395 0.963388 0.964871 0.964286 0.967188 0.961759 0.961547 0.962357 0.971847 0.956199 0.956747 0.961727 0.962557 0.964507 0.966081 0.964427 0.968995 0.964626 0.964721 0.962411 0.970511 0.958987 0.957744 0.965067 0.966030 0.964092 0.970282 0.968743 0.969315 1.000000 0.965769 0.963908 0.971302 0.786767 0.750939 0.745028 0.748196 0.757107 0.754044 0.753202 0.753857 0.749827 0.752034 0.743931 0.754921 -0.362175 0.708332
trans_count', Period2021-10_M 0.466350 0.839388 0.833806 0.845954 0.847967 0.850287 0.850755 0.852688 0.851657 0.846919 0.851644 0.832424 0.844685 0.821475 0.823526 0.831107 0.861739 0.872996 0.845524 0.874834 0.878606 0.845503 0.859741 0.846757 0.893257 0.835343 0.822236 0.861903 0.852824 0.835904 0.887688 0.873343 0.814881 0.809729 0.880462 0.803156 0.896317 0.649211 0.610966 0.591634 0.621812 0.641474 0.672309 0.663814 0.678189 0.664846 0.642102 0.633811 0.541509 0.956624 0.954428 0.965989 0.962532 0.965503 0.967798 0.968400 0.967040 0.965315 0.966302 0.964773 0.972297 0.962035 0.958393 0.962930 0.967412 0.968716 0.970614 0.968909 0.968149 0.965520 0.965267 0.966568 0.973178 0.961808 0.957385 0.967836 0.966264 0.966678 0.971336 0.969643 0.970057 0.965769 1.000000 0.966450 0.975317 0.783248 0.745005 0.739404 0.744373 0.753530 0.752964 0.749299 0.748529 0.746224 0.747870 0.740236 0.750631 -0.361170 0.706262
trans_count', Period2021-11_M 0.446187 0.830636 0.827235 0.834370 0.835301 0.842099 0.837605 0.838165 0.838768 0.833687 0.835088 0.823108 0.835941 0.808098 0.812496 0.824353 0.853786 0.864882 0.832502 0.866862 0.868488 0.835618 0.856345 0.832861 0.885006 0.833453 0.814392 0.854579 0.843407 0.827985 0.879331 0.858957 0.809838 0.803070 0.849140 0.830238 0.889986 0.644061 0.611118 0.586675 0.614304 0.635523 0.670243 0.657405 0.678653 0.658287 0.637462 0.625715 0.533113 0.957738 0.955970 0.963370 0.959376 0.964357 0.964403 0.963448 0.964941 0.963436 0.963415 0.962756 0.971675 0.954551 0.959116 0.964579 0.964423 0.965942 0.966338 0.966044 0.965690 0.961819 0.964728 0.962174 0.969568 0.958380 0.956048 0.962760 0.963272 0.964735 0.968639 0.966476 0.966966 0.963908 0.966450 1.000000 0.971746 0.786011 0.748204 0.742017 0.745742 0.756162 0.755158 0.750955 0.753536 0.749633 0.751347 0.742418 0.755374 -0.351511 0.700989
trans_count', Period2021-12_M 0.463210 0.838665 0.834301 0.849719 0.846067 0.852541 0.848754 0.852593 0.852516 0.846134 0.848271 0.835341 0.847102 0.823219 0.823028 0.836989 0.862990 0.879445 0.841700 0.877744 0.881406 0.856426 0.863875 0.848910 0.896249 0.838157 0.826159 0.863449 0.852731 0.840924 0.891030 0.876728 0.822088 0.812219 0.858108 0.812390 0.915550 0.650143 0.606469 0.585431 0.612539 0.634962 0.668539 0.667213 0.676513 0.658597 0.635385 0.629749 0.547174 0.964976 0.962359 0.973897 0.968034 0.971579 0.973892 0.974856 0.975982 0.970957 0.971664 0.972521 0.979625 0.967092 0.965633 0.972924 0.972256 0.976048 0.976460 0.976628 0.976594 0.973261 0.972266 0.973341 0.980249 0.970356 0.967637 0.974210 0.974138 0.974387 0.978535 0.976197 0.976751 0.971302 0.975317 0.971746 1.000000 0.785870 0.744862 0.740697 0.744175 0.753113 0.750121 0.749450 0.749103 0.747317 0.749160 0.741414 0.751404 -0.358288 0.703522
trans_count', Period2022-01_M 0.391451 0.711435 0.692535 0.712484 0.720980 0.720615 0.714582 0.718733 0.715572 0.717429 0.715502 0.709497 0.723941 0.700432 0.699454 0.709346 0.697603 0.699806 0.679121 0.703439 0.710189 0.704659 0.685613 0.693071 0.712751 0.698680 0.657298 0.706977 0.675977 0.682073 0.727882 0.715777 0.673349 0.671201 0.703209 0.670213 0.734516 0.870999 0.825256 0.813924 0.836083 0.858392 0.864814 0.859580 0.873827 0.853545 0.837831 0.812652 0.431212 0.775488 0.769627 0.780958 0.780228 0.783582 0.779776 0.787370 0.782188 0.781901 0.780135 0.784834 0.791597 0.780749 0.778878 0.778698 0.776766 0.784279 0.785753 0.782775 0.790424 0.787757 0.771896 0.786643 0.789291 0.787025 0.769589 0.786148 0.779790 0.787577 0.790543 0.790116 0.782921 0.786767 0.783248 0.786011 0.785870 1.000000 0.957469 0.963603 0.965188 0.967788 0.967119 0.968045 0.966760 0.964363 0.963514 0.963986 0.970047 -0.489930 0.887249
trans_count', Period2022-02_M 0.396722 0.688998 0.673323 0.689472 0.698612 0.699520 0.693278 0.700187 0.692499 0.694365 0.686305 0.686723 0.703548 0.675948 0.680356 0.685964 0.669698 0.673265 0.654875 0.671278 0.680931 0.675514 0.663714 0.664956 0.681482 0.675285 0.624002 0.677974 0.637803 0.651841 0.692078 0.687885 0.648396 0.632295 0.674024 0.642693 0.700906 0.841274 0.871897 0.831407 0.841552 0.865691 0.870410 0.869389 0.881946 0.863684 0.840799 0.816538 0.423790 0.735984 0.732719 0.741742 0.745064 0.746225 0.742189 0.752561 0.741296 0.742125 0.741882 0.745240 0.752245 0.743570 0.743161 0.739112 0.736384 0.745808 0.748882 0.742504 0.749905 0.749815 0.732712 0.745893 0.749243 0.746139 0.729400 0.746810 0.739221 0.751805 0.750392 0.749233 0.744314 0.750939 0.745005 0.748204 0.744862 0.957469 1.000000 0.967119 0.964290 0.966479 0.968972 0.970496 0.970830 0.968693 0.966457 0.968443 0.973884 -0.479164 0.890640
trans_count', Period2022-03_M 0.382438 0.678354 0.661588 0.680758 0.691083 0.687619 0.683926 0.690842 0.680239 0.682877 0.681231 0.678714 0.692590 0.664802 0.667740 0.677049 0.657860 0.662994 0.643607 0.666287 0.677098 0.669384 0.650814 0.659089 0.673947 0.666183 0.622298 0.674358 0.636735 0.645395 0.691239 0.682778 0.641121 0.632333 0.660750 0.634674 0.695031 0.847472 0.842511 0.850891 0.846604 0.869395 0.873804 0.867990 0.883200 0.859313 0.843504 0.829292 0.410006 0.731423 0.725674 0.736975 0.738785 0.740327 0.737970 0.745791 0.734382 0.735249 0.739841 0.740896 0.747135 0.736706 0.737095 0.733378 0.730123 0.740189 0.743980 0.737320 0.746597 0.745030 0.728545 0.739723 0.744226 0.743819 0.722420 0.743238 0.734480 0.744920 0.746497 0.746122 0.739680 0.745028 0.739404 0.742017 0.740697 0.963603 0.967119 1.000000 0.971317 0.973263 0.973231 0.974385 0.974895 0.970180 0.973238 0.973015 0.979845 -0.485866 0.893646
trans_count', Period2022-04_M 0.384457 0.681420 0.663667 0.683252 0.691615 0.693059 0.686240 0.692114 0.683681 0.688189 0.682674 0.686633 0.696505 0.666009 0.674279 0.678187 0.666771 0.664106 0.647125 0.663314 0.674186 0.675292 0.653688 0.657854 0.679059 0.662207 0.620498 0.674833 0.632547 0.651801 0.695940 0.684271 0.638037 0.631868 0.668346 0.636572 0.697487 0.851319 0.840038 0.829609 0.869563 0.867248 0.871857 0.867408 0.885347 0.860971 0.844079 0.813356 0.408992 0.733094 0.729947 0.741081 0.740690 0.743999 0.740574 0.749167 0.740281 0.739738 0.740984 0.746418 0.752619 0.740364 0.741507 0.736065 0.737487 0.744327 0.747076 0.742025 0.748057 0.748167 0.734611 0.745997 0.749153 0.747192 0.727557 0.747384 0.738094 0.750310 0.750463 0.747367 0.742245 0.748196 0.744373 0.745742 0.744175 0.965188 0.964290 0.971317 1.000000 0.972530 0.974927 0.975565 0.974188 0.972178 0.971752 0.971840 0.978184 -0.485535 0.888429
trans_count', Period2022-05_M 0.380524 0.693669 0.677460 0.694329 0.703747 0.703482 0.697467 0.703907 0.696728 0.696022 0.695304 0.692764 0.708274 0.681234 0.681749 0.691315 0.675192 0.678629 0.660406 0.678809 0.686733 0.682876 0.657781 0.674329 0.688090 0.673722 0.631581 0.686381 0.649073 0.658279 0.699393 0.695373 0.646214 0.635526 0.677952 0.648841 0.706582 0.851104 0.839816 0.833267 0.850124 0.887618 0.871964 0.870923 0.881052 0.859493 0.841560 0.822461 0.412935 0.746574 0.739931 0.749054 0.753052 0.754230 0.751734 0.759328 0.749740 0.747635 0.752118 0.753251 0.761707 0.749660 0.749530 0.747440 0.748209 0.754730 0.759274 0.751692 0.760513 0.759267 0.741723 0.756297 0.759428 0.753897 0.736315 0.755562 0.747600 0.757040 0.758839 0.756898 0.752505 0.757107 0.753530 0.756162 0.753113 0.967788 0.966479 0.973263 0.972530 1.000000 0.974238 0.976367 0.974511 0.972672 0.973601 0.972961 0.981247 -0.482944 0.892923
trans_count', Period2022-06_M 0.390909 0.698531 0.675784 0.697214 0.701992 0.705345 0.698274 0.705526 0.694803 0.697622 0.697443 0.694308 0.710196 0.677933 0.685305 0.688463 0.673857 0.679130 0.663682 0.675417 0.684657 0.683377 0.662728 0.669810 0.688169 0.676060 0.627268 0.686826 0.643742 0.660551 0.699280 0.691893 0.649635 0.638216 0.677476 0.648653 0.707091 0.851914 0.846499 0.835346 0.852970 0.871093 0.892583 0.869245 0.887219 0.868626 0.846076 0.824874 0.416941 0.746206 0.737556 0.750041 0.750198 0.752125 0.750194 0.758590 0.747682 0.747517 0.751737 0.753602 0.761485 0.747283 0.751023 0.744063 0.745056 0.753696 0.758108 0.750248 0.756870 0.756972 0.741505 0.752521 0.756500 0.753546 0.735138 0.755217 0.743250 0.758855 0.758136 0.756881 0.750216 0.754044 0.752964 0.755158 0.750121 0.967119 0.968972 0.973231 0.974927 0.974238 1.000000 0.976317 0.977212 0.974427 0.975974 0.975510 0.981595 -0.482791 0.892607
trans_count', Period2022-07_M 0.401394 0.689756 0.674317 0.688379 0.698905 0.700335 0.694527 0.702023 0.692115 0.696155 0.692981 0.691582 0.703130 0.677273 0.678660 0.686814 0.670431 0.669480 0.656154 0.671938 0.680064 0.678496 0.657600 0.666289 0.681234 0.678518 0.626695 0.682636 0.639502 0.656361 0.693067 0.689911 0.644750 0.637049 0.672796 0.641581 0.705079 0.850658 0.843304 0.833152 0.851385 0.871987 0.871604 0.890620 0.885121 0.863077 0.845524 0.819645 0.421444 0.741239 0.736602 0.745717 0.747987 0.750236 0.745826 0.754874 0.746185 0.745144 0.747354 0.749813 0.757786 0.746232 0.744692 0.741180 0.740429 0.749451 0.753794 0.744893 0.754907 0.755099 0.737114 0.751304 0.753295 0.754962 0.733062 0.751008 0.741479 0.755219 0.756929 0.754701 0.747141 0.753202 0.749299 0.750955 0.749450 0.968045 0.970496 0.974385 0.975565 0.976367 0.976317 1.000000 0.976161 0.973020 0.974739 0.971947 0.982790 -0.497159 0.896988
trans_count', Period2022-08_M 0.383796 0.689761 0.670360 0.690341 0.698208 0.699899 0.693276 0.700026 0.691264 0.692092 0.692508 0.689957 0.704423 0.675784 0.676305 0.686971 0.666265 0.670035 0.650962 0.668603 0.678642 0.678743 0.659911 0.663248 0.680155 0.672926 0.625790 0.677295 0.646275 0.653789 0.699583 0.690176 0.644037 0.637886 0.670578 0.645226 0.701228 0.849944 0.843900 0.833228 0.849214 0.869243 0.869112 0.869859 0.906348 0.865410 0.844315 0.825852 0.417321 0.742995 0.735203 0.747340 0.747760 0.750630 0.747912 0.755431 0.745657 0.745429 0.749050 0.750523 0.759768 0.747268 0.745271 0.744439 0.741835 0.747893 0.754195 0.746855 0.755740 0.753857 0.739946 0.749183 0.755440 0.750537 0.735412 0.750280 0.744841 0.755059 0.754762 0.753685 0.748458 0.753857 0.748529 0.753536 0.749103 0.966760 0.970830 0.974895 0.974188 0.974511 0.977212 0.976161 1.000000 0.975816 0.974499 0.976368 0.982610 -0.494344 0.895401
trans_count', Period2022-09_M 0.388190 0.684164 0.668236 0.682826 0.690579 0.690638 0.684839 0.693268 0.685406 0.685192 0.681990 0.679749 0.695758 0.668616 0.671894 0.679282 0.660779 0.673062 0.647449 0.667312 0.679077 0.674979 0.654772 0.656360 0.674944 0.670954 0.625327 0.675826 0.634655 0.646701 0.687320 0.688407 0.641402 0.628512 0.672335 0.646185 0.699030 0.845167 0.843595 0.828886 0.845094 0.865247 0.868073 0.864652 0.879271 0.881175 0.841544 0.818578 0.416938 0.740085 0.734834 0.743815 0.746225 0.747060 0.744242 0.752706 0.743632 0.743720 0.743306 0.744999 0.756140 0.744572 0.745141 0.740225 0.738688 0.747222 0.750486 0.744123 0.751406 0.749845 0.736533 0.745365 0.750447 0.748493 0.730610 0.748087 0.739372 0.753025 0.749928 0.752369 0.745450 0.749827 0.746224 0.749633 0.747317 0.964363 0.968693 0.970180 0.972178 0.972672 0.974427 0.973020 0.975816 1.000000 0.970902 0.971544 0.979117 -0.485123 0.890208
trans_count', Period2022-10_M 0.382005 0.692214 0.671368 0.691742 0.701867 0.701191 0.695884 0.703741 0.693420 0.695329 0.692692 0.695174 0.706967 0.675463 0.683872 0.691428 0.674390 0.675682 0.656761 0.674469 0.684896 0.683370 0.662381 0.667404 0.684281 0.678259 0.630377 0.680370 0.644966 0.662196 0.705856 0.688227 0.648356 0.639768 0.675826 0.645588 0.706657 0.854856 0.844115 0.832791 0.852597 0.867683 0.879390 0.875819 0.885680 0.859560 0.870270 0.821202 0.409012 0.741331 0.733239 0.744029 0.747917 0.747656 0.744891 0.754414 0.744906 0.742906 0.746829 0.752355 0.757085 0.744963 0.748538 0.741849 0.741348 0.749482 0.755922 0.749437 0.753604 0.753261 0.736975 0.751414 0.754041 0.752365 0.732912 0.750889 0.742116 0.756473 0.756802 0.752960 0.746564 0.752034 0.747870 0.751347 0.749160 0.963514 0.966457 0.973238 0.971752 0.973601 0.975974 0.974739 0.974499 0.970902 1.000000 0.971222 0.978947 -0.474939 0.894353
trans_count', Period2022-11_M 0.384506 0.681917 0.666378 0.687030 0.694853 0.692597 0.688481 0.692095 0.685243 0.687185 0.686582 0.684785 0.699163 0.667332 0.670156 0.682229 0.661446 0.666054 0.647089 0.667621 0.673862 0.672602 0.653144 0.652119 0.676305 0.668173 0.617407 0.673343 0.633539 0.651326 0.692138 0.680297 0.633065 0.632765 0.664605 0.640701 0.695593 0.849208 0.845864 0.833495 0.847493 0.868579 0.874152 0.863439 0.884814 0.863772 0.842455 0.847240 0.412859 0.732299 0.727435 0.739627 0.741938 0.740346 0.739755 0.744880 0.737392 0.736241 0.741341 0.741925 0.749638 0.736640 0.738885 0.735997 0.735272 0.740257 0.745867 0.739283 0.747308 0.746290 0.730454 0.740298 0.746136 0.744249 0.724915 0.741452 0.736947 0.746699 0.746470 0.744341 0.741560 0.743931 0.740236 0.742418 0.741414 0.963986 0.968443 0.973015 0.971840 0.972961 0.975510 0.971947 0.976368 0.971544 0.971222 1.000000 0.978277 -0.488068 0.892340
trans_count', Period2022-12_M 0.386503 0.691093 0.672320 0.691127 0.699185 0.699547 0.693433 0.700103 0.691406 0.694720 0.691326 0.689249 0.703618 0.675469 0.675368 0.687648 0.671346 0.674330 0.655558 0.673420 0.681997 0.677314 0.658319 0.668906 0.682646 0.673478 0.621871 0.682048 0.643505 0.658233 0.699529 0.691764 0.640839 0.635479 0.675287 0.644104 0.701500 0.850431 0.845610 0.836847 0.848946 0.873182 0.873739 0.873687 0.888905 0.866008 0.840952 0.822226 0.414490 0.744807 0.737034 0.749861 0.750305 0.752428 0.748385 0.757298 0.746790 0.747488 0.749673 0.752143 0.760988 0.747145 0.748008 0.745461 0.743318 0.753715 0.757086 0.748059 0.756881 0.755251 0.741615 0.752085 0.756286 0.753412 0.735258 0.752810 0.746653 0.758888 0.756788 0.755379 0.751305 0.754921 0.750631 0.755374 0.751404 0.970047 0.973884 0.979845 0.978184 0.981247 0.981595 0.982790 0.982610 0.979117 0.978947 0.978277 1.000000 -0.496004 0.906172
age -0.183845 -0.314821 -0.308327 -0.317154 -0.318468 -0.320345 -0.327645 -0.330162 -0.318723 -0.324954 -0.317660 -0.312195 -0.313784 -0.305790 -0.291454 -0.316831 -0.248190 -0.280079 -0.253134 -0.250594 -0.262369 -0.263945 -0.236607 -0.257556 -0.266159 -0.287751 -0.256310 -0.264663 -0.265520 -0.281386 -0.316324 -0.302711 -0.287198 -0.275483 -0.292653 -0.277398 -0.318287 -0.414632 -0.410968 -0.401775 -0.413467 -0.410973 -0.394793 -0.395597 -0.404256 -0.402856 -0.386815 -0.398418 -0.188891 -0.347326 -0.343110 -0.357593 -0.347712 -0.351065 -0.354487 -0.352189 -0.349196 -0.351742 -0.346931 -0.347030 -0.365127 -0.346319 -0.339621 -0.358257 -0.338082 -0.365077 -0.348231 -0.343487 -0.367185 -0.357035 -0.348091 -0.350098 -0.365081 -0.359618 -0.348739 -0.354280 -0.350138 -0.362948 -0.366696 -0.355006 -0.355743 -0.362175 -0.361170 -0.351511 -0.358288 -0.489930 -0.479164 -0.485866 -0.485535 -0.482944 -0.482791 -0.497159 -0.494344 -0.485123 -0.474939 -0.488068 -0.496004 1.000000 -0.414863
Target 0.434274 0.756122 0.736216 0.754889 0.760828 0.762687 0.758815 0.767414 0.757570 0.761886 0.754560 0.754430 0.771768 0.752379 0.748464 0.762595 0.697025 0.709183 0.672751 0.701580 0.694828 0.693890 0.674775 0.689751 0.706695 0.688967 0.638166 0.711260 0.666261 0.678258 0.709477 0.722674 0.651812 0.662267 0.709599 0.666978 0.731005 0.881193 0.882408 0.875140 0.884264 0.895476 0.873737 0.877524 0.887157 0.878777 0.855533 0.831043 0.399576 0.692425 0.687360 0.696378 0.698029 0.699534 0.697402 0.706792 0.694985 0.699467 0.698112 0.703649 0.703665 0.703251 0.696683 0.698150 0.695663 0.704129 0.710393 0.703322 0.703759 0.707639 0.688113 0.704729 0.708231 0.701070 0.681696 0.704396 0.696229 0.708239 0.709025 0.704450 0.701626 0.708332 0.706262 0.700989 0.703522 0.887249 0.890640 0.893646 0.888429 0.892923 0.892607 0.896988 0.895401 0.890208 0.894353 0.892340 0.906172 -0.414863 1.000000

Splitting the Data¶

In [317]:
from sklearn.model_selection import train_test_split

data_df, test_df = train_test_split(trans_df, test_size=0.25, random_state=42)

train_df, val_df = train_test_split(data_df, test_size=0.25, random_state=42)

Peform One-hot Encoding and Frequency Encoding¶

One-hot Encoding¶

In [318]:
from sklearn.preprocessing import OneHotEncoder

- gender¶

In [319]:
train_gender = train_df['gender'].values.reshape(-1, 1)
val_gender = val_df['gender'].values.reshape(-1, 1)
test_gender = test_df['gender'].values.reshape(-1, 1)
In [320]:
encoder = OneHotEncoder(sparse=False, handle_unknown='ignore')
encoder.fit(train_gender)
/opt/homebrew/lib/python3.10/site-packages/sklearn/preprocessing/_encoders.py:868: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.
  warnings.warn(
Out[320]:
OneHotEncoder(handle_unknown='ignore', sparse=False, sparse_output=False)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
OneHotEncoder(handle_unknown='ignore', sparse=False, sparse_output=False)
In [321]:
train_gender_encoded = encoder.transform(train_gender)
val_gender_encoded = encoder.transform(val_gender)
test_gender_encoded = encoder.transform(test_gender)
In [322]:
train_encoded = pd.DataFrame(train_gender_encoded, columns=encoder.get_feature_names_out(['gender']), index=train_df.index)
val_encoded = pd.DataFrame(val_gender_encoded, columns=encoder.get_feature_names_out(['gender']), index=val_df.index)
test_encoded = pd.DataFrame(test_gender_encoded, columns=encoder.get_feature_names_out(['gender']), index=test_df.index)
In [323]:
train_df = pd.concat([train_df.drop('gender', axis=1), train_encoded], axis=1)
val_df = pd.concat([val_df.drop('gender', axis=1), val_encoded], axis=1)
test_df = pd.concat([test_df.drop('gender', axis=1), test_encoded], axis=1)
In [324]:
train_df.head()
Out[324]:
acct_num total_amt', Period2018-12_M total_amt', Period2019-01_M total_amt', Period2019-02_M total_amt', Period2019-03_M total_amt', Period2019-04_M total_amt', Period2019-05_M total_amt', Period2019-06_M total_amt', Period2019-07_M total_amt', Period2019-08_M total_amt', Period2019-09_M total_amt', Period2019-10_M total_amt', Period2019-11_M total_amt', Period2019-12_M total_amt', Period2020-01_M total_amt', Period2020-02_M total_amt', Period2020-03_M total_amt', Period2020-04_M total_amt', Period2020-05_M total_amt', Period2020-06_M total_amt', Period2020-07_M total_amt', Period2020-08_M total_amt', Period2020-09_M total_amt', Period2020-10_M total_amt', Period2020-11_M total_amt', Period2020-12_M total_amt', Period2021-01_M total_amt', Period2021-02_M total_amt', Period2021-03_M total_amt', Period2021-04_M total_amt', Period2021-05_M total_amt', Period2021-06_M total_amt', Period2021-07_M total_amt', Period2021-08_M total_amt', Period2021-09_M total_amt', Period2021-10_M total_amt', Period2021-11_M total_amt', Period2021-12_M total_amt', Period2022-01_M total_amt', Period2022-02_M total_amt', Period2022-03_M total_amt', Period2022-04_M total_amt', Period2022-05_M total_amt', Period2022-06_M total_amt', Period2022-07_M total_amt', Period2022-08_M total_amt', Period2022-09_M total_amt', Period2022-10_M total_amt', Period2022-11_M trans_count', Period2018-12_M trans_count', Period2019-01_M trans_count', Period2019-02_M trans_count', Period2019-03_M trans_count', Period2019-04_M trans_count', Period2019-05_M trans_count', Period2019-06_M trans_count', Period2019-07_M trans_count', Period2019-08_M trans_count', Period2019-09_M trans_count', Period2019-10_M trans_count', Period2019-11_M trans_count', Period2019-12_M trans_count', Period2020-01_M trans_count', Period2020-02_M trans_count', Period2020-03_M trans_count', Period2020-04_M trans_count', Period2020-05_M trans_count', Period2020-06_M trans_count', Period2020-07_M trans_count', Period2020-08_M trans_count', Period2020-09_M trans_count', Period2020-10_M trans_count', Period2020-11_M trans_count', Period2020-12_M trans_count', Period2021-01_M trans_count', Period2021-02_M trans_count', Period2021-03_M trans_count', Period2021-04_M trans_count', Period2021-05_M trans_count', Period2021-06_M trans_count', Period2021-07_M trans_count', Period2021-08_M trans_count', Period2021-09_M trans_count', Period2021-10_M trans_count', Period2021-11_M trans_count', Period2021-12_M trans_count', Period2022-01_M trans_count', Period2022-02_M trans_count', Period2022-03_M trans_count', Period2022-04_M trans_count', Period2022-05_M trans_count', Period2022-06_M trans_count', Period2022-07_M trans_count', Period2022-08_M trans_count', Period2022-09_M trans_count', Period2022-10_M trans_count', Period2022-11_M trans_count', Period2022-12_M age job city age_group Target gender_F gender_M
775 804015362037 0.00 3123.45 3105.97 5041.83 4733.73 5936.18 6003.54 6051.26 6367.30 4736.08 5356.48 3346.69 7675.86 6310.53 1377.63 3287.53 2139.62 3903.18 3003.03 4367.25 5236.80 6016.03 3291.25 4962.00 8367.57 2246.67 2179.63 5037.71 4771.50 4452.99 4557.49 3782.78 4488.08 6007.86 4082.10 6886.50 8889.13 3602.92 1668.74 2884.05 2848.66 2693.22 10212.05 2734.09 3644.57 2627.50 3793.83 2611.68 0 51 55 90 85 98 100 103 114 82 94 83 174 51 59 93 70 95 95 94 125 81 78 83 163 60 55 86 86 88 87 106 108 69 77 83 174 78 52 73 81 98 87 108 107 73 81 67 180 17 Social research officer, government Alexandria <18 9043.66 0.0 1.0
387 393298413208 835.23 13984.16 13072.89 22517.81 18581.44 19171.91 24378.44 23321.76 23096.50 16874.02 22011.31 21538.35 36467.39 13150.15 15694.78 19229.05 14852.31 14146.26 10714.28 17610.24 13843.86 11823.63 17891.55 11184.49 28122.69 8620.69 9220.63 11166.28 12700.94 13819.21 13263.25 15326.90 16453.47 12934.47 10507.25 12008.29 24896.22 23180.77 18767.65 30773.95 31211.25 30385.21 29784.23 31958.44 32422.40 22131.49 26061.62 27641.94 6 115 118 182 170 177 216 198 208 162 180 196 296 114 142 158 171 183 174 193 206 162 184 139 357 116 119 163 165 188 203 212 194 173 158 173 340 212 201 335 315 341 394 445 397 304 322 326 688 36 Museum/gallery curator Martinsville 35-44 50099.15 1.0 0.0
875 904203673772 109.25 3874.83 3185.31 4219.23 4778.72 7488.23 4563.77 5527.70 6626.63 5529.02 5035.23 6595.74 14487.49 4720.18 4265.60 5156.29 4247.51 4456.70 6036.49 7903.62 6176.79 6025.58 4364.49 4731.12 11120.26 5251.95 2942.86 4276.94 6309.80 6372.30 4204.82 14146.69 6139.59 4948.50 4585.45 4052.23 11364.84 6114.07 9099.57 5951.29 9278.87 9831.60 10794.38 10221.10 19323.26 5951.54 8416.12 6541.68 1 62 56 77 81 74 91 93 111 76 89 90 166 66 59 88 78 80 92 127 103 84 80 87 175 56 59 75 74 84 78 101 113 90 73 74 174 105 104 133 157 149 162 157 177 125 147 137 306 47 Associate Professor Columbus 45-54 16953.57 1.0 0.0
167 161262210869 515.56 3907.69 4115.90 6057.44 9166.64 8864.49 7783.45 6226.43 7784.48 4629.48 6979.59 5756.26 11488.76 4907.35 3380.40 5645.12 4390.19 6692.77 4651.44 5565.36 6438.19 4567.41 4067.22 7752.15 12125.93 5030.73 4221.51 3984.55 8278.41 7064.68 6136.51 5711.87 5072.22 3478.43 4534.57 5076.26 12793.79 7403.88 5764.70 10368.75 7900.36 9252.29 13011.26 8008.01 9496.38 6655.23 7850.18 7498.56 2 63 50 96 85 102 104 107 115 80 96 91 164 66 69 83 78 103 85 92 107 89 78 99 140 67 55 70 88 77 84 111 99 68 92 101 184 103 88 180 136 136 174 144 156 121 132 139 247 48 Sales professional, IT Sherwood 45-54 12961.80 0.0 1.0
548 554990175020 309.46 2250.33 4034.83 4791.52 6821.45 5595.22 7270.58 6715.06 7131.95 4652.46 5181.26 3933.96 10511.21 3574.56 5213.22 3447.49 6180.09 5508.11 5619.60 6550.95 7972.92 6848.87 8116.71 4261.93 11158.79 5294.54 4128.81 6235.92 4698.01 6792.97 6638.87 6533.63 5404.85 4138.93 4173.73 5332.28 10626.88 2582.78 3340.69 8687.11 3793.95 4759.63 5033.70 4758.63 5631.71 4179.91 6404.53 6111.89 3 49 61 79 91 95 99 88 91 86 103 81 167 60 61 72 65 93 90 98 123 93 94 70 156 70 60 75 88 93 105 98 102 72 95 89 168 52 65 88 87 96 89 98 102 81 84 98 157 52 Acupuncturist Forks 45-54 8897.78 1.0 0.0

Frequency Encoding¶

In [325]:
frequency_map = {}
  • job
In [326]:
job_frequency = train_df['job'].value_counts(normalize=True)

train_df['job_frequency'] = train_df['job'].map(job_frequency)
val_df['job_frequency'] = val_df['job'].map(job_frequency)
test_df['job_frequency'] = test_df['job'].map(job_frequency)
In [327]:
val_df['job_frequency'].fillna(val_df['job_frequency'].mean(), inplace=True)
test_df['job_frequency'].fillna(val_df['job_frequency'].mean(), inplace=True)
  • city
In [328]:
city_frequency = train_df['city'].value_counts(normalize=True)

train_df['city_frequency'] = train_df['city'].map(city_frequency)
val_df['city_frequency'] = val_df['city'].map(city_frequency)
test_df['city_frequency'] = test_df['city'].map(city_frequency)
In [329]:
val_df['city_frequency'].fillna(val_df['city_frequency'].mean(), inplace=True)
test_df['city_frequency'].fillna(val_df['city_frequency'].mean(), inplace=True)
  • age
In [330]:
age_group_mapping = {
    '<18': 1,
    '18-24': 2,
    '25-34': 3,
    '35-44': 4,
    '45-54': 5,
    '55-64': 6,
    '65+': 7
}
In [331]:
train_df['age_group_encoded'] = train_df['age_group'].map(age_group_mapping)
val_df['age_group_encoded'] = val_df['age_group'].map(age_group_mapping)
test_df['age_group_encoded'] = test_df['age_group'].map(age_group_mapping)
In [332]:
train_df.drop(['age', 'age_group','job', 'city'], axis=1, inplace=True)
val_df.drop(['age', 'age_group', 'job', 'city'], axis=1, inplace=True)
test_df.drop(['age', 'age_group', 'job', 'city'], axis=1, inplace=True)
In [333]:
train_df.head()
Out[333]:
acct_num total_amt', Period2018-12_M total_amt', Period2019-01_M total_amt', Period2019-02_M total_amt', Period2019-03_M total_amt', Period2019-04_M total_amt', Period2019-05_M total_amt', Period2019-06_M total_amt', Period2019-07_M total_amt', Period2019-08_M total_amt', Period2019-09_M total_amt', Period2019-10_M total_amt', Period2019-11_M total_amt', Period2019-12_M total_amt', Period2020-01_M total_amt', Period2020-02_M total_amt', Period2020-03_M total_amt', Period2020-04_M total_amt', Period2020-05_M total_amt', Period2020-06_M total_amt', Period2020-07_M total_amt', Period2020-08_M total_amt', Period2020-09_M total_amt', Period2020-10_M total_amt', Period2020-11_M total_amt', Period2020-12_M total_amt', Period2021-01_M total_amt', Period2021-02_M total_amt', Period2021-03_M total_amt', Period2021-04_M total_amt', Period2021-05_M total_amt', Period2021-06_M total_amt', Period2021-07_M total_amt', Period2021-08_M total_amt', Period2021-09_M total_amt', Period2021-10_M total_amt', Period2021-11_M total_amt', Period2021-12_M total_amt', Period2022-01_M total_amt', Period2022-02_M total_amt', Period2022-03_M total_amt', Period2022-04_M total_amt', Period2022-05_M total_amt', Period2022-06_M total_amt', Period2022-07_M total_amt', Period2022-08_M total_amt', Period2022-09_M total_amt', Period2022-10_M total_amt', Period2022-11_M trans_count', Period2018-12_M trans_count', Period2019-01_M trans_count', Period2019-02_M trans_count', Period2019-03_M trans_count', Period2019-04_M trans_count', Period2019-05_M trans_count', Period2019-06_M trans_count', Period2019-07_M trans_count', Period2019-08_M trans_count', Period2019-09_M trans_count', Period2019-10_M trans_count', Period2019-11_M trans_count', Period2019-12_M trans_count', Period2020-01_M trans_count', Period2020-02_M trans_count', Period2020-03_M trans_count', Period2020-04_M trans_count', Period2020-05_M trans_count', Period2020-06_M trans_count', Period2020-07_M trans_count', Period2020-08_M trans_count', Period2020-09_M trans_count', Period2020-10_M trans_count', Period2020-11_M trans_count', Period2020-12_M trans_count', Period2021-01_M trans_count', Period2021-02_M trans_count', Period2021-03_M trans_count', Period2021-04_M trans_count', Period2021-05_M trans_count', Period2021-06_M trans_count', Period2021-07_M trans_count', Period2021-08_M trans_count', Period2021-09_M trans_count', Period2021-10_M trans_count', Period2021-11_M trans_count', Period2021-12_M trans_count', Period2022-01_M trans_count', Period2022-02_M trans_count', Period2022-03_M trans_count', Period2022-04_M trans_count', Period2022-05_M trans_count', Period2022-06_M trans_count', Period2022-07_M trans_count', Period2022-08_M trans_count', Period2022-09_M trans_count', Period2022-10_M trans_count', Period2022-11_M trans_count', Period2022-12_M Target gender_F gender_M job_frequency city_frequency age_group_encoded
775 804015362037 0.00 3123.45 3105.97 5041.83 4733.73 5936.18 6003.54 6051.26 6367.30 4736.08 5356.48 3346.69 7675.86 6310.53 1377.63 3287.53 2139.62 3903.18 3003.03 4367.25 5236.80 6016.03 3291.25 4962.00 8367.57 2246.67 2179.63 5037.71 4771.50 4452.99 4557.49 3782.78 4488.08 6007.86 4082.10 6886.50 8889.13 3602.92 1668.74 2884.05 2848.66 2693.22 10212.05 2734.09 3644.57 2627.50 3793.83 2611.68 0 51 55 90 85 98 100 103 114 82 94 83 174 51 59 93 70 95 95 94 125 81 78 83 163 60 55 86 86 88 87 106 108 69 77 83 174 78 52 73 81 98 87 108 107 73 81 67 180 9043.66 0.0 1.0 0.005495 0.005495 1
387 393298413208 835.23 13984.16 13072.89 22517.81 18581.44 19171.91 24378.44 23321.76 23096.50 16874.02 22011.31 21538.35 36467.39 13150.15 15694.78 19229.05 14852.31 14146.26 10714.28 17610.24 13843.86 11823.63 17891.55 11184.49 28122.69 8620.69 9220.63 11166.28 12700.94 13819.21 13263.25 15326.90 16453.47 12934.47 10507.25 12008.29 24896.22 23180.77 18767.65 30773.95 31211.25 30385.21 29784.23 31958.44 32422.40 22131.49 26061.62 27641.94 6 115 118 182 170 177 216 198 208 162 180 196 296 114 142 158 171 183 174 193 206 162 184 139 357 116 119 163 165 188 203 212 194 173 158 173 340 212 201 335 315 341 394 445 397 304 322 326 688 50099.15 1.0 0.0 0.001832 0.001832 4
875 904203673772 109.25 3874.83 3185.31 4219.23 4778.72 7488.23 4563.77 5527.70 6626.63 5529.02 5035.23 6595.74 14487.49 4720.18 4265.60 5156.29 4247.51 4456.70 6036.49 7903.62 6176.79 6025.58 4364.49 4731.12 11120.26 5251.95 2942.86 4276.94 6309.80 6372.30 4204.82 14146.69 6139.59 4948.50 4585.45 4052.23 11364.84 6114.07 9099.57 5951.29 9278.87 9831.60 10794.38 10221.10 19323.26 5951.54 8416.12 6541.68 1 62 56 77 81 74 91 93 111 76 89 90 166 66 59 88 78 80 92 127 103 84 80 87 175 56 59 75 74 84 78 101 113 90 73 74 174 105 104 133 157 149 162 157 177 125 147 137 306 16953.57 1.0 0.0 0.005495 0.001832 5
167 161262210869 515.56 3907.69 4115.90 6057.44 9166.64 8864.49 7783.45 6226.43 7784.48 4629.48 6979.59 5756.26 11488.76 4907.35 3380.40 5645.12 4390.19 6692.77 4651.44 5565.36 6438.19 4567.41 4067.22 7752.15 12125.93 5030.73 4221.51 3984.55 8278.41 7064.68 6136.51 5711.87 5072.22 3478.43 4534.57 5076.26 12793.79 7403.88 5764.70 10368.75 7900.36 9252.29 13011.26 8008.01 9496.38 6655.23 7850.18 7498.56 2 63 50 96 85 102 104 107 115 80 96 91 164 66 69 83 78 103 85 92 107 89 78 99 140 67 55 70 88 77 84 111 99 68 92 101 184 103 88 180 136 136 174 144 156 121 132 139 247 12961.80 0.0 1.0 0.001832 0.001832 5
548 554990175020 309.46 2250.33 4034.83 4791.52 6821.45 5595.22 7270.58 6715.06 7131.95 4652.46 5181.26 3933.96 10511.21 3574.56 5213.22 3447.49 6180.09 5508.11 5619.60 6550.95 7972.92 6848.87 8116.71 4261.93 11158.79 5294.54 4128.81 6235.92 4698.01 6792.97 6638.87 6533.63 5404.85 4138.93 4173.73 5332.28 10626.88 2582.78 3340.69 8687.11 3793.95 4759.63 5033.70 4758.63 5631.71 4179.91 6404.53 6111.89 3 49 61 79 91 95 99 88 91 86 103 81 167 60 61 72 65 93 90 98 123 93 94 70 156 70 60 75 88 93 105 98 102 72 95 89 168 52 65 88 87 96 89 98 102 81 84 98 157 8897.78 1.0 0.0 0.007326 0.001832 5
In [334]:
train_df.isna().sum()
Out[334]:
acct_num                         0
total_amt', Period2018-12_M      0
total_amt', Period2019-01_M      0
total_amt', Period2019-02_M      0
total_amt', Period2019-03_M      0
total_amt', Period2019-04_M      0
total_amt', Period2019-05_M      0
total_amt', Period2019-06_M      0
total_amt', Period2019-07_M      0
total_amt', Period2019-08_M      0
total_amt', Period2019-09_M      0
total_amt', Period2019-10_M      0
total_amt', Period2019-11_M      0
total_amt', Period2019-12_M      0
total_amt', Period2020-01_M      0
total_amt', Period2020-02_M      0
total_amt', Period2020-03_M      0
total_amt', Period2020-04_M      0
total_amt', Period2020-05_M      0
total_amt', Period2020-06_M      0
total_amt', Period2020-07_M      0
total_amt', Period2020-08_M      0
total_amt', Period2020-09_M      0
total_amt', Period2020-10_M      0
total_amt', Period2020-11_M      0
total_amt', Period2020-12_M      0
total_amt', Period2021-01_M      0
total_amt', Period2021-02_M      0
total_amt', Period2021-03_M      0
total_amt', Period2021-04_M      0
total_amt', Period2021-05_M      0
total_amt', Period2021-06_M      0
total_amt', Period2021-07_M      0
total_amt', Period2021-08_M      0
total_amt', Period2021-09_M      0
total_amt', Period2021-10_M      0
total_amt', Period2021-11_M      0
total_amt', Period2021-12_M      0
total_amt', Period2022-01_M      0
total_amt', Period2022-02_M      0
total_amt', Period2022-03_M      0
total_amt', Period2022-04_M      0
total_amt', Period2022-05_M      0
total_amt', Period2022-06_M      0
total_amt', Period2022-07_M      0
total_amt', Period2022-08_M      0
total_amt', Period2022-09_M      0
total_amt', Period2022-10_M      0
total_amt', Period2022-11_M      0
trans_count', Period2018-12_M    0
trans_count', Period2019-01_M    0
trans_count', Period2019-02_M    0
trans_count', Period2019-03_M    0
trans_count', Period2019-04_M    0
trans_count', Period2019-05_M    0
trans_count', Period2019-06_M    0
trans_count', Period2019-07_M    0
trans_count', Period2019-08_M    0
trans_count', Period2019-09_M    0
trans_count', Period2019-10_M    0
trans_count', Period2019-11_M    0
trans_count', Period2019-12_M    0
trans_count', Period2020-01_M    0
trans_count', Period2020-02_M    0
trans_count', Period2020-03_M    0
trans_count', Period2020-04_M    0
trans_count', Period2020-05_M    0
trans_count', Period2020-06_M    0
trans_count', Period2020-07_M    0
trans_count', Period2020-08_M    0
trans_count', Period2020-09_M    0
trans_count', Period2020-10_M    0
trans_count', Period2020-11_M    0
trans_count', Period2020-12_M    0
trans_count', Period2021-01_M    0
trans_count', Period2021-02_M    0
trans_count', Period2021-03_M    0
trans_count', Period2021-04_M    0
trans_count', Period2021-05_M    0
trans_count', Period2021-06_M    0
trans_count', Period2021-07_M    0
trans_count', Period2021-08_M    0
trans_count', Period2021-09_M    0
trans_count', Period2021-10_M    0
trans_count', Period2021-11_M    0
trans_count', Period2021-12_M    0
trans_count', Period2022-01_M    0
trans_count', Period2022-02_M    0
trans_count', Period2022-03_M    0
trans_count', Period2022-04_M    0
trans_count', Period2022-05_M    0
trans_count', Period2022-06_M    0
trans_count', Period2022-07_M    0
trans_count', Period2022-08_M    0
trans_count', Period2022-09_M    0
trans_count', Period2022-10_M    0
trans_count', Period2022-11_M    0
trans_count', Period2022-12_M    0
Target                           0
gender_F                         0
gender_M                         0
job_frequency                    0
city_frequency                   0
age_group_encoded                0
dtype: int64
In [335]:
val_df.isna().sum()
Out[335]:
acct_num                         0
total_amt', Period2018-12_M      0
total_amt', Period2019-01_M      0
total_amt', Period2019-02_M      0
total_amt', Period2019-03_M      0
total_amt', Period2019-04_M      0
total_amt', Period2019-05_M      0
total_amt', Period2019-06_M      0
total_amt', Period2019-07_M      0
total_amt', Period2019-08_M      0
total_amt', Period2019-09_M      0
total_amt', Period2019-10_M      0
total_amt', Period2019-11_M      0
total_amt', Period2019-12_M      0
total_amt', Period2020-01_M      0
total_amt', Period2020-02_M      0
total_amt', Period2020-03_M      0
total_amt', Period2020-04_M      0
total_amt', Period2020-05_M      0
total_amt', Period2020-06_M      0
total_amt', Period2020-07_M      0
total_amt', Period2020-08_M      0
total_amt', Period2020-09_M      0
total_amt', Period2020-10_M      0
total_amt', Period2020-11_M      0
total_amt', Period2020-12_M      0
total_amt', Period2021-01_M      0
total_amt', Period2021-02_M      0
total_amt', Period2021-03_M      0
total_amt', Period2021-04_M      0
total_amt', Period2021-05_M      0
total_amt', Period2021-06_M      0
total_amt', Period2021-07_M      0
total_amt', Period2021-08_M      0
total_amt', Period2021-09_M      0
total_amt', Period2021-10_M      0
total_amt', Period2021-11_M      0
total_amt', Period2021-12_M      0
total_amt', Period2022-01_M      0
total_amt', Period2022-02_M      0
total_amt', Period2022-03_M      0
total_amt', Period2022-04_M      0
total_amt', Period2022-05_M      0
total_amt', Period2022-06_M      0
total_amt', Period2022-07_M      0
total_amt', Period2022-08_M      0
total_amt', Period2022-09_M      0
total_amt', Period2022-10_M      0
total_amt', Period2022-11_M      0
trans_count', Period2018-12_M    0
trans_count', Period2019-01_M    0
trans_count', Period2019-02_M    0
trans_count', Period2019-03_M    0
trans_count', Period2019-04_M    0
trans_count', Period2019-05_M    0
trans_count', Period2019-06_M    0
trans_count', Period2019-07_M    0
trans_count', Period2019-08_M    0
trans_count', Period2019-09_M    0
trans_count', Period2019-10_M    0
trans_count', Period2019-11_M    0
trans_count', Period2019-12_M    0
trans_count', Period2020-01_M    0
trans_count', Period2020-02_M    0
trans_count', Period2020-03_M    0
trans_count', Period2020-04_M    0
trans_count', Period2020-05_M    0
trans_count', Period2020-06_M    0
trans_count', Period2020-07_M    0
trans_count', Period2020-08_M    0
trans_count', Period2020-09_M    0
trans_count', Period2020-10_M    0
trans_count', Period2020-11_M    0
trans_count', Period2020-12_M    0
trans_count', Period2021-01_M    0
trans_count', Period2021-02_M    0
trans_count', Period2021-03_M    0
trans_count', Period2021-04_M    0
trans_count', Period2021-05_M    0
trans_count', Period2021-06_M    0
trans_count', Period2021-07_M    0
trans_count', Period2021-08_M    0
trans_count', Period2021-09_M    0
trans_count', Period2021-10_M    0
trans_count', Period2021-11_M    0
trans_count', Period2021-12_M    0
trans_count', Period2022-01_M    0
trans_count', Period2022-02_M    0
trans_count', Period2022-03_M    0
trans_count', Period2022-04_M    0
trans_count', Period2022-05_M    0
trans_count', Period2022-06_M    0
trans_count', Period2022-07_M    0
trans_count', Period2022-08_M    0
trans_count', Period2022-09_M    0
trans_count', Period2022-10_M    0
trans_count', Period2022-11_M    0
trans_count', Period2022-12_M    0
Target                           0
gender_F                         0
gender_M                         0
job_frequency                    0
city_frequency                   0
age_group_encoded                0
dtype: int64
In [336]:
test_df.isna().sum()
Out[336]:
acct_num                         0
total_amt', Period2018-12_M      0
total_amt', Period2019-01_M      0
total_amt', Period2019-02_M      0
total_amt', Period2019-03_M      0
total_amt', Period2019-04_M      0
total_amt', Period2019-05_M      0
total_amt', Period2019-06_M      0
total_amt', Period2019-07_M      0
total_amt', Period2019-08_M      0
total_amt', Period2019-09_M      0
total_amt', Period2019-10_M      0
total_amt', Period2019-11_M      0
total_amt', Period2019-12_M      0
total_amt', Period2020-01_M      0
total_amt', Period2020-02_M      0
total_amt', Period2020-03_M      0
total_amt', Period2020-04_M      0
total_amt', Period2020-05_M      0
total_amt', Period2020-06_M      0
total_amt', Period2020-07_M      0
total_amt', Period2020-08_M      0
total_amt', Period2020-09_M      0
total_amt', Period2020-10_M      0
total_amt', Period2020-11_M      0
total_amt', Period2020-12_M      0
total_amt', Period2021-01_M      0
total_amt', Period2021-02_M      0
total_amt', Period2021-03_M      0
total_amt', Period2021-04_M      0
total_amt', Period2021-05_M      0
total_amt', Period2021-06_M      0
total_amt', Period2021-07_M      0
total_amt', Period2021-08_M      0
total_amt', Period2021-09_M      0
total_amt', Period2021-10_M      0
total_amt', Period2021-11_M      0
total_amt', Period2021-12_M      0
total_amt', Period2022-01_M      0
total_amt', Period2022-02_M      0
total_amt', Period2022-03_M      0
total_amt', Period2022-04_M      0
total_amt', Period2022-05_M      0
total_amt', Period2022-06_M      0
total_amt', Period2022-07_M      0
total_amt', Period2022-08_M      0
total_amt', Period2022-09_M      0
total_amt', Period2022-10_M      0
total_amt', Period2022-11_M      0
trans_count', Period2018-12_M    0
trans_count', Period2019-01_M    0
trans_count', Period2019-02_M    0
trans_count', Period2019-03_M    0
trans_count', Period2019-04_M    0
trans_count', Period2019-05_M    0
trans_count', Period2019-06_M    0
trans_count', Period2019-07_M    0
trans_count', Period2019-08_M    0
trans_count', Period2019-09_M    0
trans_count', Period2019-10_M    0
trans_count', Period2019-11_M    0
trans_count', Period2019-12_M    0
trans_count', Period2020-01_M    0
trans_count', Period2020-02_M    0
trans_count', Period2020-03_M    0
trans_count', Period2020-04_M    0
trans_count', Period2020-05_M    0
trans_count', Period2020-06_M    0
trans_count', Period2020-07_M    0
trans_count', Period2020-08_M    0
trans_count', Period2020-09_M    0
trans_count', Period2020-10_M    0
trans_count', Period2020-11_M    0
trans_count', Period2020-12_M    0
trans_count', Period2021-01_M    0
trans_count', Period2021-02_M    0
trans_count', Period2021-03_M    0
trans_count', Period2021-04_M    0
trans_count', Period2021-05_M    0
trans_count', Period2021-06_M    0
trans_count', Period2021-07_M    0
trans_count', Period2021-08_M    0
trans_count', Period2021-09_M    0
trans_count', Period2021-10_M    0
trans_count', Period2021-11_M    0
trans_count', Period2021-12_M    0
trans_count', Period2022-01_M    0
trans_count', Period2022-02_M    0
trans_count', Period2022-03_M    0
trans_count', Period2022-04_M    0
trans_count', Period2022-05_M    0
trans_count', Period2022-06_M    0
trans_count', Period2022-07_M    0
trans_count', Period2022-08_M    0
trans_count', Period2022-09_M    0
trans_count', Period2022-10_M    0
trans_count', Period2022-11_M    0
trans_count', Period2022-12_M    0
Target                           0
gender_F                         0
gender_M                         0
job_frequency                    0
city_frequency                   0
age_group_encoded                0
dtype: int64
In [337]:
X_train = train_df.drop(["acct_num", "Target"], axis=1)
y_train = train_df["Target"]

X_val = val_df.drop(["acct_num", "Target"], axis=1)
y_val = val_df["Target"]

X_test = test_df.drop(["acct_num", "Target"], axis=1)
y_test = test_df["Target"]

Data Standarization (Standard Scalling)¶

In [338]:
from sklearn.preprocessing import StandardScaler
In [339]:
scaler = StandardScaler()
In [340]:
X_train_scaled = scaler.fit_transform(X_train)
X_val_scaled = scaler.transform(X_val)
X_test_scaled = scaler.transform(X_test)
In [341]:
X_train = pd.DataFrame(X_train_scaled, columns=X_train.columns)
X_val = pd.DataFrame(X_val_scaled, columns=X_val.columns)
X_test = pd.DataFrame(X_test_scaled, columns=X_test.columns)

I. Multivariate Linear Regression¶

In the regression analysis, we developed and evaluated seven models to predict the next month’s spending. These models include:

Baseline: This model serves as a benchmark and is generated using the mean of the predictor variable. It provides a reference point for evaluating the performance of other models. Multivariate Regression: A linear regression model that considers multiple predictor variables. Lasso Regression: A linear regression model that applies L1 regularization to the model coefficients, promoting sparsity and feature selection. Ridge Regression: A linear regression model that applies L2 regularization to the model coefficients, helping to reduce overfitting. ElasticNet Regression: A linear regression model that combines L1 and L2 regularization techniques, offering a balance between feature selection and model complexity control. Decision Tree Regressor: A non-linear regression model that uses decision tree algorithms. Random Forest Regressor: An ensemble learning method that combines multiple decision trees. Gradient Boosting Regressor: Another ensemble learning method that sequentially trains weak learners (decision trees) to improve model performance.

Before splitting the data, we performed standard scaling on the features to normalize their values and ensure they were on a similar scale. This step was crucial to prevent any particular feature from dominating the model training process. Additionally, we applied one-hot encoding to the ‘gender’ variable and frequency-encoding to the ‘job’ and ‘city’ columns. The ‘age_group’ variable was mapped to an ordinal type ranging from 1 to 7.

We split the data into training, validation, and testing sets to ensure a fair evaluation of the models. The data was divided so that 20% was reserved for testing, while the remaining 80% was split into training (80%) and validation (20%) sets. The performance of each model was evaluated using metrics such as mean absolute error (MAE), mean squared error (MSE), and root mean square error (RMSE) on both the training and validation sets.

Feature selection techniques, including feature engineering, were applied to all models to identify the most relevant predictors (see more details in the appendix). For the ensemble techniques (Decision Tree Regressor, Random Forest Regressor, and Gradient Boosting Regressor), hyperparameter tuning was performed to optimize their performance.

In [342]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error

Baseline model¶

Baseline Train dataset¶

In [343]:
y_mean = y_train.mean()
y_base = np.full(y_train.shape, y_mean)

mae = mean_absolute_error(y_train, y_base)
mse = mean_squared_error(y_train, y_base)
rmse = mean_squared_error(y_train, y_base, squared=False)

baseline_train = pd.DataFrame({'mae':mae,
                               'mse':mse,
                               'rmse':rmse}, index=['Baseline_Train'])
In [344]:
baseline_train
Out[344]:
mae mse rmse
Baseline_Train 8307.422361 1.105862e+08 10515.995863

Baseline Test dataset¶

In [345]:
y_mean = y_test.mean()
y_base = np.full(y_test.shape, y_mean)

mae = mean_absolute_error(y_base, y_test)
mse = mean_squared_error(y_base, y_test)
rmse = mean_squared_error(y_base, y_test, squared=False)

baseline_test = pd.DataFrame({'mae':mae,
                               'mse':mse,
                               'rmse':rmse}, index=['Baseline_Test'])
In [346]:
baseline_test
Out[346]:
mae mse rmse
Baseline_Test 7751.203825 1.000556e+08 10002.777477

Multilinear Function¶

In [347]:
def multilinear(X_train, y_train, X_test, y_test, index_train, index_test):    
    reg = LinearRegression()
    reg.fit(X_train, y_train)
    y_preds_train = reg.predict(X_train)
    y_preds_test = reg.predict(X_test)

    mse = mean_squared_error(y_train, y_preds_train)
    mae = mean_absolute_error(y_train, y_preds_train)
    rmse = mean_squared_error(y_train, y_preds_train, squared=False)

    multi_train = pd.DataFrame({'mae': mae,
                           'mse': mse,
                           'rmse': rmse},
                           index=[index_train])
    
    mse = mean_squared_error(y_test, y_preds_test)
    mae = mean_absolute_error(y_test, y_preds_test)
    rmse = mean_squared_error(y_test, y_preds_test, squared=False)

    multi_test = pd.DataFrame({'mae': mae,
                           'mse': mse,
                           'rmse': rmse},
                           index=[index_test])

    multi_models = pd.concat([multi_train, multi_test])
    

    fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(16,6))

    # Plot the predicted vs actual target values for the training set
    axes[0].plot(y_train, y_preds_train, 'o', color='orange', label='Predictions')
    axes[0].plot(y_train, y_train, '-', color='red', label='Actual')

    axes[0].set_xlabel('Actual')
    axes[0].set_ylabel('Predicted')
    axes[0].set_title(f'{index_train}: Comparison of Actual vs. Predicted Target')

    axes[0].legend()


    axes[1].plot(y_test, y_preds_test, 'o', color='orange', label='Predictions')
    axes[1].plot(y_test, y_test, '-', color='red', label='Actual')

    axes[1].set_xlabel('Actual')
    axes[1].set_ylabel('Predicted')
    axes[1].set_title(f'{index_test}: Comparison of Actual vs. Predicted Target')

    axes[1].legend()
 
    return multi_models

1. Multi Linear Regression with All Feature¶

In [348]:
multi = multilinear(X_train, y_train, X_val, y_val, 'MultiLinear_Train', 'MultiLinear_Val')
multi
Out[348]:
mae mse rmse
MultiLinear_Train 2079.423872 9.298125e+06 3049.282748
MultiLinear_Val 2267.698408 1.029542e+07 3208.647385

2. Feature Engineering 1 (Correlation >= 0.5)¶

In [349]:
correlation = {}

for column in trans_df.columns:
    if column not in ['acct_num', 'Target']:
        if trans_df[column].dtype in [np.int64, np.float64]:
            correlation[column] = np.abs(round(trans_df['Target'].corr(trans_df[column]), 2))
In [350]:
sorted_correlation = dict(sorted(correlation.items(), key=lambda x:x[1], reverse=True))
In [351]:
feature_var = []

for key in sorted_correlation.keys():
    if sorted_correlation[key] > 0.5:
        feature_var.append(key)
In [352]:
X_train_feature1 = train_df[feature_var]
X_val_feature1 = val_df[feature_var]
X_test_feature1 = test_df[feature_var]
In [353]:
X_train_feature1_scaled = scaler.fit_transform(X_train_feature1)
X_val_feature1_scaled = scaler.transform(X_val_feature1)
X_test_feature1_scaled = scaler.transform(X_test_feature1)
In [354]:
X_train_feature1 = pd.DataFrame(X_train_feature1_scaled, columns=X_train_feature1.columns)
X_val_feature1 = pd.DataFrame(X_val_feature1_scaled, columns=X_val_feature1.columns)
X_test_feature1 = pd.DataFrame(X_test_feature1_scaled, columns=X_test_feature1.columns)
In [355]:
multi_feature1 = multilinear(X_train_feature1, y_train, X_val_feature1, y_val, 'MultiLinear_Feature1_Train', 'MultiLinear_Feature1_Val')
multi_feature1
Out[355]:
mae mse rmse
MultiLinear_Feature1_Train 2070.482628 9.344101e+06 3056.812263
MultiLinear_Feature1_Val 2230.827601 1.010146e+07 3178.280116

3. Feature Engineering 2 (Correlation >= 0.7)¶

In [356]:
feature_var = []

for key in sorted_correlation.keys():
    if sorted_correlation[key] > 0.7:
        feature_var.append(key)
In [357]:
X_train_feature2 = train_df[feature_var]
X_val_feature2 = val_df[feature_var]
X_test_feature2 = test_df[feature_var]
In [358]:
X_train_feature2_scaled = scaler.fit_transform(X_train_feature2)
X_val_feature2_scaled = scaler.transform(X_val_feature2)
X_test_feature2_scaled = scaler.transform(X_test_feature2)
In [359]:
X_train_feature2 = pd.DataFrame(X_train_feature2_scaled, columns=X_train_feature2.columns)
X_val_feature2 = pd.DataFrame(X_val_feature2_scaled, columns=X_val_feature2.columns)
X_test_feature2 = pd.DataFrame(X_test_feature2_scaled, columns=X_test_feature2.columns)
In [360]:
multi_feature2 = multilinear(X_train_feature2, y_train, X_val_feature2, y_val, 'MultiLinear_Feature2_Train', 'MultiLinear_Feature2_Val')
multi_feature2
Out[360]:
mae mse rmse
MultiLinear_Feature2_Train 2093.693198 1.030223e+07 3209.708360
MultiLinear_Feature2_Val 2123.415974 9.319224e+06 3052.740411

4. Feature Engineering 3 (Correlation >= 0.8)¶

In [361]:
feature_var = []

for key in sorted_correlation.keys():
    if sorted_correlation[key] > 0.8:
        feature_var.append(key)
In [362]:
X_train_feature3 = train_df[feature_var]
X_val_feature3 = val_df[feature_var]
X_test_feature3 = test_df[feature_var]
In [363]:
X_train_feature3_scaled = scaler.fit_transform(X_train_feature3)
X_val_feature3_scaled = scaler.transform(X_val_feature3)
X_test_feature3_scaled = scaler.transform(X_test_feature3)
In [364]:
X_train_feature1 = pd.DataFrame(X_train_feature3_scaled, columns=X_train_feature3.columns)
X_val_feature1 = pd.DataFrame(X_val_feature3_scaled, columns=X_val_feature3.columns)
X_test_feature1 = pd.DataFrame(X_test_feature3_scaled, columns=X_test_feature3.columns)
In [365]:
multi_feature3 = multilinear(X_train_feature3, y_train, X_val_feature3, y_val, 'MultiLinear_Feature3_Train', 'MultiLinear_Feature3_Val')
multi_feature3
Out[365]:
mae mse rmse
MultiLinear_Feature3_Train 2278.747762 1.223434e+07 3497.762131
MultiLinear_Feature3_Val 2124.895843 9.739218e+06 3120.772055
In [366]:
multi_model = pd.concat([baseline_train, baseline_test, multi, multi_feature1, multi_feature2, multi_feature3])
multi_model
Out[366]:
mae mse rmse
Baseline_Train 8307.422361 1.105862e+08 10515.995863
Baseline_Test 7751.203825 1.000556e+08 10002.777477
MultiLinear_Train 2079.423872 9.298125e+06 3049.282748
MultiLinear_Val 2267.698408 1.029542e+07 3208.647385
MultiLinear_Feature1_Train 2070.482628 9.344101e+06 3056.812263
MultiLinear_Feature1_Val 2230.827601 1.010146e+07 3178.280116
MultiLinear_Feature2_Train 2093.693198 1.030223e+07 3209.708360
MultiLinear_Feature2_Val 2123.415974 9.319224e+06 3052.740411
MultiLinear_Feature3_Train 2278.747762 1.223434e+07 3497.762131
MultiLinear_Feature3_Val 2124.895843 9.739218e+06 3120.772055

II. Lasso Model¶

In [367]:
from sklearn.linear_model import Lasso

Lasso Function¶

In [368]:
def lassomodel(X_train, y_train, X_test, y_test, index_train, index_test):
    lasso_reg = Lasso()
    lasso_reg.fit(X_train, y_train)
    y_preds_train = lasso_reg.predict(X_train)
    y_preds_test = lasso_reg.predict(X_test)

    mse = mean_squared_error(y_train, y_preds_train)
    mae = mean_absolute_error(y_train, y_preds_train)
    rmse = mean_squared_error(y_train, y_preds_train, squared=False)

    lasso_train = pd.DataFrame({'mae': mae,
                                'mse': mse,
                                'rmse': rmse},
                               index=[index_train])

    mse = mean_squared_error(y_test, y_preds_test)
    mae = mean_absolute_error(y_test, y_preds_test)
    rmse = mean_squared_error(y_test, y_preds_test, squared=False)

    lasso_test = pd.DataFrame({'mae': mae,
                               'mse': mse,
                               'rmse': rmse},
                              index=[index_test])

    lasso_models = pd.concat([lasso_train, lasso_test])

    fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(16, 6))

    # Plot the predicted vs actual target values for the training set
    axes[0].plot(y_train, y_preds_train, 'o',
                 color='orange', label='Predictions')
    axes[0].plot(y_train, y_train, '-', color='red', label='Actual')

    axes[0].set_xlabel('Actual')
    axes[0].set_ylabel('Predicted')
    axes[0].set_title(
        f'{index_train}: Comparison of Actual vs. Predicted Target')

    axes[0].legend()

    axes[1].plot(y_test, y_preds_test, 'o',
                 color='orange', label='Predictions')
    axes[1].plot(y_test, y_test, '-', color='red', label='Actual')

    axes[1].set_xlabel('Actual')
    axes[1].set_ylabel('Predicted')
    axes[1].set_title(
        f'{index_test}: Comparison of Actual vs. Predicted Target')

    axes[1].legend()

    plt.show()

    return lasso_models

1. Lasso Regression with All Features¶

In [369]:
lasso = lassomodel(X_train, y_train, X_val, y_val, 'Lasso_Train', 'Lasso_Val')
lasso
/opt/homebrew/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:631: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 2.229e+09, tolerance: 6.038e+06
  model = cd_fast.enet_coordinate_descent(
Out[369]:
mae mse rmse
Lasso_Train 2093.725380 9.353681e+06 3058.378796
Lasso_Val 2279.843057 1.024395e+07 3200.616371

2. Lasso Regression Feature Engineering 1 (Correlation >= 0.5)¶

In [370]:
lasso_feature1 = lassomodel(X_train_feature1, y_train, X_val_feature1, y_val, 'Lasso_FEATURE1_Train', 'Lasso_FEATURE1_Val')
lasso_feature1
/opt/homebrew/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:631: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.766e+09, tolerance: 6.038e+06
  model = cd_fast.enet_coordinate_descent(
Out[370]:
mae mse rmse
Lasso_FEATURE1_Train 2278.114357 1.223520e+07 3497.885577
Lasso_FEATURE1_Val 2123.981346 9.719629e+06 3117.631893

3. Lasso Regression Feature Engineering 2 (Correlation >= 0.7)¶

In [371]:
lasso_feature2 = lassomodel(X_train_feature2, y_train, X_val_feature2, y_val, 'Lasso_FEATURE2_Train', 'Lasso_FEATURE2_Val')
lasso_feature2
/opt/homebrew/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:631: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.137e+09, tolerance: 6.038e+06
  model = cd_fast.enet_coordinate_descent(
Out[371]:
mae mse rmse
Lasso_FEATURE2_Train 2092.325287 1.030345e+07 3209.898411
Lasso_FEATURE2_Val 2117.264295 9.286441e+06 3047.366207

4. Lasso Regression Feature Engineering 3 (Correlation >= 0.8)¶

In [372]:
lasso_feature3 = lassomodel(X_train_feature3, y_train, X_val_feature3, y_val, 'Lasso_FEATURE3_Train', 'Lasso_FEATURE3_Val')
lasso_feature3
/opt/homebrew/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:631: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 3.340e+09, tolerance: 6.038e+06
  model = cd_fast.enet_coordinate_descent(
Out[372]:
mae mse rmse
Lasso_FEATURE3_Train 2279.940306 1.223561e+07 3497.943503
Lasso_FEATURE3_Val 2127.655356 9.759305e+06 3123.988600
In [373]:
lasso_model = pd.concat([lasso, lasso_feature1, lasso_feature2, lasso_feature3])
lasso_model
Out[373]:
mae mse rmse
Lasso_Train 2093.725380 9.353681e+06 3058.378796
Lasso_Val 2279.843057 1.024395e+07 3200.616371
Lasso_FEATURE1_Train 2278.114357 1.223520e+07 3497.885577
Lasso_FEATURE1_Val 2123.981346 9.719629e+06 3117.631893
Lasso_FEATURE2_Train 2092.325287 1.030345e+07 3209.898411
Lasso_FEATURE2_Val 2117.264295 9.286441e+06 3047.366207
Lasso_FEATURE3_Train 2279.940306 1.223561e+07 3497.943503
Lasso_FEATURE3_Val 2127.655356 9.759305e+06 3123.988600

III. Ridge Model¶

In [374]:
from sklearn.linear_model import Ridge

Ridge Function¶

In [375]:
def ridgemodel(X_train, y_train, X_test, y_test, index_train, index_test):
    ridge_reg = Ridge()
    ridge_reg.fit(X_train, y_train)
    y_preds_train = ridge_reg.predict(X_train)
    y_preds_test = ridge_reg.predict(X_test)

    mse = mean_squared_error(y_train, y_preds_train)
    mae = mean_absolute_error(y_train, y_preds_train)
    rmse = mean_squared_error(y_train, y_preds_train, squared=False)

    ridge_train = pd.DataFrame({'mae': mae,
                                'mse': mse,
                                'rmse': rmse},
                               index=[index_train])

    mse = mean_squared_error(y_test, y_preds_test)
    mae = mean_absolute_error(y_test, y_preds_test)
    rmse = mean_squared_error(y_test, y_preds_test, squared=False)

    ridge_test = pd.DataFrame({'mae': mae,
                               'mse': mse,
                               'rmse': rmse},
                              index=[index_test])

    ridge_models = pd.concat([ridge_train, ridge_test])

    fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(16, 6))

    # Plot the predicted vs actual target values for the training set
    axes[0].plot(y_train, y_preds_train, 'o',
                 color='orange', label='Predictions')
    axes[0].plot(y_train, y_train, '-', color='red', label='Actual')

    axes[0].set_xlabel('Actual')
    axes[0].set_ylabel('Predicted')
    axes[0].set_title(
        f'{index_train}: Comparison of Actual vs. Predicted Target')

    axes[0].legend()

    axes[1].plot(y_test, y_preds_test, 'o',
                 color='orange', label='Predictions')
    axes[1].plot(y_test, y_test, '-', color='red', label='Actual')

    axes[1].set_xlabel('Actual')
    axes[1].set_ylabel('Predicted')
    axes[1].set_title(
        f'{index_test}: Comparison of Actual vs. Predicted Target')

    axes[1].legend()

    plt.show()

    return ridge_models

1. Ridge Regression with All Features¶

In [376]:
ridge = ridgemodel(X_train, y_train, X_val, y_val, 'Ridge_Train', 'Ridge_Val')
ridge
Out[376]:
mae mse rmse
Ridge_Train 2061.966576 9.326546e+06 3053.939442
Ridge_Val 2211.667197 9.930921e+06 3151.336341

2. Ridge Regression Feature Engineering 1 (Correlation >= 0.5)¶

In [377]:
ridge_feature1 = ridgemodel(X_train_feature1, y_train, X_val_feature1, y_val, 'Ridge_FEATURE1_Train', 'Ridge_FEATURE1_Val')
ridge_feature1
Out[377]:
mae mse rmse
Ridge_FEATURE1_Train 2274.841596 1.225304e+07 3500.433978
Ridge_FEATURE1_Val 2108.951082 9.600463e+06 3098.461357

3. Ridge Regression Feature Engineering 2 (Correlation >= 0.7)¶

In [378]:
ridge_feature2 = ridgemodel(X_train_feature2, y_train, X_val_feature2, y_val, 'Ridge_FEATURE2_Train', 'Ridge_FEATURE2_Val')
ridge_feature2
Out[378]:
mae mse rmse
Ridge_FEATURE2_Train 2084.101832 1.031979e+07 3212.443623
Ridge_FEATURE2_Val 2102.691899 9.118768e+06 3019.729731

4. Ridge Regression Feature Engineering 3 (Correlation >= 0.8)¶

In [379]:
ridge_feature3 = ridgemodel(X_train_feature3, y_train, X_val_feature3, y_val, 'Ridge_FEATURE3_Train', 'Ridge_FEATURE3_Val')
ridge_feature3
Out[379]:
mae mse rmse
Ridge_FEATURE3_Train 2278.747159 1.223434e+07 3497.762131
Ridge_FEATURE3_Val 2124.893956 9.739202e+06 3120.769408
In [380]:
ridge_model = pd.concat([ridge, ridge_feature1, ridge_feature2, ridge_feature3])
ridge_model
Out[380]:
mae mse rmse
Ridge_Train 2061.966576 9.326546e+06 3053.939442
Ridge_Val 2211.667197 9.930921e+06 3151.336341
Ridge_FEATURE1_Train 2274.841596 1.225304e+07 3500.433978
Ridge_FEATURE1_Val 2108.951082 9.600463e+06 3098.461357
Ridge_FEATURE2_Train 2084.101832 1.031979e+07 3212.443623
Ridge_FEATURE2_Val 2102.691899 9.118768e+06 3019.729731
Ridge_FEATURE3_Train 2278.747159 1.223434e+07 3497.762131
Ridge_FEATURE3_Val 2124.893956 9.739202e+06 3120.769408

IV. ElasticNet¶

In [381]:
from sklearn.linear_model import ElasticNet

ElasticNet Function¶

In [382]:
def elasticnet(X_train, y_train, X_test, y_test, index_train, index_test):
    elastic_reg = ElasticNet()
    elastic_reg.fit(X_train, y_train)
    y_preds_train = elastic_reg.predict(X_train)
    y_preds_test = elastic_reg.predict(X_test)

    mse = mean_squared_error(y_train, y_preds_train)
    mae = mean_absolute_error(y_train, y_preds_train)
    rmse = mean_squared_error(y_train, y_preds_train, squared=False)

    elastic_train = pd.DataFrame({'mae': mae,
                                  'mse': mse,
                                  'rmse': rmse},
                                 index=[index_train])

    mse = mean_squared_error(y_test, y_preds_test)
    mae = mean_absolute_error(y_test, y_preds_test)
    rmse = mean_squared_error(y_test, y_preds_test, squared=False)

    elastic_test = pd.DataFrame({'mae': mae,
                                 'mse': mse,
                                 'rmse': rmse},
                                index=[index_test])

    elastic_models = pd.concat([elastic_train, elastic_test])

    fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(16, 6))

    axes[0].plot(y_train, y_preds_train, 'o',
                 color='orange', label='Predictions')
    axes[0].plot(y_train, y_train, '-', color='red', label='Actual')

    axes[0].set_xlabel('Actual')
    axes[0].set_ylabel('Predicted')
    axes[0].set_title(
        f'{index_train}: Comparison of Actual vs. Predicted Target')

    axes[0].legend()

    axes[1].plot(y_test, y_preds_test, 'o',
                 color='orange', label='Predictions')
    axes[1].plot(y_test, y_test, '-', color='red', label='Actual')

    axes[1].set_xlabel('Actual')
    axes[1].set_ylabel('Predicted')
    axes[1].set_title(
        f'{index_test}: Comparison of Actual vs. Predicted Target')

    axes[1].legend()

    return elastic_models

1. ElasticNet with All Features¶

In [383]:
elastic = elasticnet(X_train, y_train, X_val, y_val, 'Elastic_Train', 'Elastic_Val')
elastic
/opt/homebrew/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:631: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.184e+09, tolerance: 6.038e+06
  model = cd_fast.enet_coordinate_descent(
Out[383]:
mae mse rmse
Elastic_Train 2284.880425 1.238910e+07 3519.815887
Elastic_Val 2073.707562 8.579011e+06 2928.994854

2. ElasticNet Feature Engineering 1 (Correlation >= 0.5)¶

In [384]:
elastic_feature1 = elasticnet(X_train_feature1, y_train, X_val_feature1, y_val, 'Elastic_FEATURE1_Train', 'Elastic_FEATURE1_Val')
elastic_feature1
Out[384]:
mae mse rmse
Elastic_FEATURE1_Train 2469.341627 1.406453e+07 3750.270443
Elastic_FEATURE1_Val 2185.184604 9.342266e+06 3056.512135

3. ElasticNet Feature Engineering 1 (Correlation >= 0.7)¶

In [385]:
elastic_feature2 = elasticnet(X_train_feature2, y_train, X_val_feature2, y_val, 'Elastic_FEATURE2_Train', 'Elastic_FEATURE2_Val')
elastic_feature2
Out[385]:
mae mse rmse
Elastic_FEATURE2_Train 2321.397873 1.289433e+07 3590.867219
Elastic_FEATURE2_Val 2109.088513 8.697749e+06 2949.194714

4. ElasticNet Feature Engineering 1 (Correlation >= 0.8)¶

In [386]:
elastic_feature3 = elasticnet(X_train_feature3, y_train, X_val_feature3, y_val, 'Elastic_FEATURE3_Train', 'Elastic_FEATURE3_Val')
elastic_feature3
/opt/homebrew/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:631: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 3.341e+09, tolerance: 6.038e+06
  model = cd_fast.enet_coordinate_descent(
Out[386]:
mae mse rmse
Elastic_FEATURE3_Train 2279.672369 1.223548e+07 3497.925509
Elastic_FEATURE3_Val 2126.979548 9.753365e+06 3123.037799
In [387]:
elastic_model = pd.concat([elastic, elastic_feature1, elastic_feature2, elastic_feature3])
elastic_model
Out[387]:
mae mse rmse
Elastic_Train 2284.880425 1.238910e+07 3519.815887
Elastic_Val 2073.707562 8.579011e+06 2928.994854
Elastic_FEATURE1_Train 2469.341627 1.406453e+07 3750.270443
Elastic_FEATURE1_Val 2185.184604 9.342266e+06 3056.512135
Elastic_FEATURE2_Train 2321.397873 1.289433e+07 3590.867219
Elastic_FEATURE2_Val 2109.088513 8.697749e+06 2949.194714
Elastic_FEATURE3_Train 2279.672369 1.223548e+07 3497.925509
Elastic_FEATURE3_Val 2126.979548 9.753365e+06 3123.037799

V. Decision Tree Regression¶

In [388]:
from sklearn.tree import DecisionTreeRegressor, plot_tree

Decision Tree Function (Default Hyperparameter)¶

In [389]:
def decision_tree_regression(X_train, y_train, X_test, y_test, index_train, index_test):
    regressor = DecisionTreeRegressor(random_state=42).fit(X_train, y_train)
    
    y_preds_train = regressor.predict(X_train)
    y_preds_test = regressor.predict(X_test)

    mse = mean_squared_error(y_train, y_preds_train)
    mae = mean_absolute_error(y_train, y_preds_train)
    rmse = mean_squared_error(y_train, y_preds_train, squared=False)

    dt_train = pd.DataFrame({'mae': mae,
                                  'mse': mse,
                                  'rmse': rmse},
                                 index=[index_train])

    mse = mean_squared_error(y_test, y_preds_test)
    mae = mean_absolute_error(y_test, y_preds_test)
    rmse = mean_squared_error(y_test, y_preds_test, squared=False)

    dt_test = pd.DataFrame({'mae': mae,
                                 'mse': mse,
                                 'rmse': rmse},
                                index=[index_test])
    
    dt_models = pd.concat([dt_train, dt_test])

    fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(24, 6))
    axes[0].set_title(f'Decision Tree: {index_train}')
    plot_tree(regressor, ax=axes[0], filled=True)

    axes[1].set_title(f'Decision  Tree: {index_test}')
    plot_tree(regressor, ax=axes[1], filled=True)
    
    plt.tight_layout()
    plt.show()

    feature_importances = regressor.feature_importances_
    sorted_indices = np.argsort(feature_importances)[::-1]
    
    feature_names = X_train.columns.values
    
    sorted_feature_importances = feature_importances[sorted_indices]
    sorted_feature_names = feature_names[sorted_indices]
    
    plt.figure(figsize=(6, 20))
    plt.barh(range(len(sorted_feature_importances)), sorted_feature_importances[::-1], align='center')
    plt.yticks(range(len(sorted_feature_importances)), sorted_feature_names[::-1])
    plt.xlabel("Feature Importance")
    plt.ylabel("Features")
    plt.title("Decision Tree Regressor - Feature Importance")
    plt.show()

    return dt_models

1. Decision Tree Regression with All Features¶

In [390]:
dt = decision_tree_regression(X_train, y_train, X_val, y_val, 'Dtregressor_Train', 'Dtregressor_Val')
dt
Out[390]:
mae mse rmse
Dtregressor_Train 0.000000 0.000000e+00 0.000000
Dtregressor_Val 3022.997923 1.981871e+07 4451.821535

2. Decision Tree Regression Feature Engineering 1 (Correlation >= 0.5)¶

In [391]:
dt_feature1 = decision_tree_regression(X_train_feature1, y_train, X_val_feature1, y_val, 'Dtregressor_FEATURE1_Train', 'Dtregressor_FEATURE1_Val')
dt_feature1
Out[391]:
mae mse rmse
Dtregressor_FEATURE1_Train 0.000000 0.000000e+00 0.000000
Dtregressor_FEATURE1_Val 3142.906175 2.105998e+07 4589.114999

3. Decision Tree Regression Feature Engineering 2 (Correlation >= 0.7)¶

In [392]:
dt_feature2 = decision_tree_regression(X_train_feature2, y_train, X_val_feature2, y_val, 'Dtregressor_FEATURE2_Train', 'Dtregressor_FEATURE2_Val')
dt_feature2
Out[392]:
mae mse rmse
Dtregressor_FEATURE2_Train 0.00000 0.000000e+00 0.000000
Dtregressor_FEATURE2_Val 3379.59929 2.756409e+07 5250.151896

4. Decision Tree Regression Feature Engineering 3 (Correlation >= 0.8)¶

In [393]:
dt_feature3 = decision_tree_regression(X_train_feature3, y_train, X_val_feature3, y_val, 'Dtregressor_FEATURE3_Train', 'Dtregressor_FEATURE3_Val')
dt_feature3
Out[393]:
mae mse rmse
Dtregressor_FEATURE3_Train 0.000000 0.000000e+00 0.0000
Dtregressor_FEATURE3_Val 3090.758525 2.071248e+07 4551.0961

Hyperparameter Tuning¶

Reduce Overfitting with max_depth (default = None)¶

In [394]:
max_depth = [2, 5, 10, 20, 50, 100, 150, 200, None]

train_mae = []
val_mae = []
train_mse = []
val_mse = []
train_rmse = []
val_rmse = []

for k in max_depth:
    regressor = DecisionTreeRegressor(random_state=42, max_depth=k).fit(X_train, y_train)

    y_preds_train = regressor.predict(X_train)
    y_preds_val = regressor.predict(X_val)

    train_mae.append(mean_absolute_error(y_train, y_preds_train))
    val_mae.append(mean_absolute_error(y_val, y_preds_val))

    train_mse.append(mean_squared_error(y_train, y_preds_train))
    val_mse.append(mean_squared_error(y_val, y_preds_val))

    train_rmse.append(mean_squared_error(y_train, y_preds_train, squared=False))
    val_rmse.append(mean_squared_error(y_val, y_preds_val, squared=False))

result_max_depth = pd.DataFrame({'train_mae': train_mae,
                                 'test_mae': val_mae,
                                 'train_mse': train_mse,
                                 'test_mse': val_mse,
                                 'train_rmse': train_rmse,
                                 'test_rmse': val_rmse}, index=max_depth)
result_max_depth
Out[394]:
train_mae test_mae train_mse test_mse train_rmse test_rmse
2.0 3815.480925 3614.774442 2.645389e+07 2.241880e+07 5143.334947 4734.849732
5.0 2017.879079 2632.876525 8.616914e+06 1.638042e+07 2935.458046 4047.273239
10.0 432.658774 3125.795647 6.059501e+05 2.398058e+07 778.427956 4896.996807
20.0 5.152083 3292.034456 1.172666e+03 2.500349e+07 34.244207 5000.348965
50.0 0.000000 3022.997923 0.000000e+00 1.981871e+07 0.000000 4451.821535
100.0 0.000000 3022.997923 0.000000e+00 1.981871e+07 0.000000 4451.821535
150.0 0.000000 3022.997923 0.000000e+00 1.981871e+07 0.000000 4451.821535
200.0 0.000000 3022.997923 0.000000e+00 1.981871e+07 0.000000 4451.821535
NaN 0.000000 3022.997923 0.000000e+00 1.981871e+07 0.000000 4451.821535
In [395]:
def plot_performance(parameter, xlabel):

    fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(16, 5))

    ax1.plot(parameter, train_mae, label='Mean Absolute Errors: Train')
    ax1.plot(parameter, val_mae, label='Mean Absolute Errors: Validation')
    ax1.legend()
    ax1.set_xlabel(xlabel)
    ax1.set_ylabel('MAE')
    ax1.set_title('Mean Absolute Error')

    ax2.plot(parameter, train_mse, label='Mean Squared Errors: Train')
    ax2.plot(parameter, val_mse, label='Mean Squared Errors: Validation')
    ax2.legend()
    ax2.set_xlabel(xlabel)
    ax2.set_ylabel('MSE')
    ax2.set_title('Mean Squared Error')

    ax3.plot(parameter, train_rmse, label='Root Mean Squared Errors: Train')
    ax3.plot(parameter, val_rmse, label='Root Mean Squared Errors: Validation')
    ax3.legend()
    ax3.set_xlabel(xlabel)
    ax3.set_ylabel('RMSE')
    ax3.set_title('Root Mean Squared Errors')
    
    plt.subplots_adjust(wspace=0.4)
    plt.show()
In [396]:
plot_performance(max_depth, 'Number of Max Depth')

max_depth = 5 appears to be the best choice¶

Reduce Overfitting with min_samples_split (max_depth=10)¶

In [397]:
min_sample_split = [2, 5, 10, 20, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500]

train_mae = []
val_mae = []
train_mse = []
val_mse = []
train_rmse = []
val_rmse = []

for k in min_sample_split:
    regressor = DecisionTreeRegressor(random_state=42, max_depth=5, min_samples_split=k).fit(X_train, y_train)

    y_preds_train = regressor.predict(X_train)
    y_preds_val = regressor.predict(X_val)

    train_mae.append(mean_absolute_error(y_train, y_preds_train))
    val_mae.append(mean_absolute_error(y_val, y_preds_val))

    train_mse.append(mean_squared_error(y_train, y_preds_train))
    val_mse.append(mean_squared_error(y_val, y_preds_val))

    train_rmse.append(mean_squared_error(y_train, y_preds_train, squared=False))
    val_rmse.append(mean_squared_error(y_val, y_preds_val, squared=False))

result_min_split = pd.DataFrame({'train_mae': train_mae,
                                 'test_mae': val_mae,
                                 'train_mse': train_mse,
                                 'test_mse': val_mse,
                                 'train_rmse': train_rmse,
                                 'test_rmse': val_rmse}, index=min_sample_split)
result_min_split
Out[397]:
train_mae test_mae train_mse test_mse train_rmse test_rmse
2 2017.879079 2632.876525 8.616914e+06 1.638042e+07 2935.458046 4047.273239
5 2039.619128 2668.817737 8.705258e+06 1.668670e+07 2950.467421 4084.935487
10 2113.086587 2616.476637 9.294261e+06 1.696806e+07 3048.649072 4119.230275
20 2145.584653 2647.062769 9.529820e+06 1.720362e+07 3087.040677 4147.724514
50 2405.394644 2453.609358 1.205364e+07 1.397932e+07 3471.835964 3738.892500
100 3005.717701 2827.152308 1.960527e+07 1.591273e+07 4427.783672 3989.076370
150 3263.783929 3107.642604 2.216672e+07 1.741710e+07 4708.154557 4173.379489
200 4062.315694 3818.220233 3.372255e+07 3.003290e+07 5807.111693 5480.227973
250 4303.294377 4110.614686 3.558026e+07 3.377878e+07 5964.918713 5811.951725
300 4303.294377 4110.614686 3.558026e+07 3.377878e+07 5964.918713 5811.951725
350 4303.294377 4110.614686 3.558026e+07 3.377878e+07 5964.918713 5811.951725
400 5428.056042 5216.171946 4.828906e+07 4.676248e+07 6949.033069 6838.310091
450 5428.056042 5216.171946 4.828906e+07 4.676248e+07 6949.033069 6838.310091
500 5428.056042 5216.171946 4.828906e+07 4.676248e+07 6949.033069 6838.310091
In [398]:
plot_performance(min_sample_split, 'Number of min_samples_split')

min_samples_split = 50 appears to be the best choice

Reduce Overfitting with min_samples_leaf (max_depth=10, min_samples_split=10)¶

In [399]:
min_samples_leaf = [2, 5, 10, 20, 50, 100, 150, 200]

train_mae = []
val_mae = []
train_mse = []
val_mse = []
train_rmse = []
val_rmse = []

for k in min_samples_leaf:
    regressor = DecisionTreeRegressor(random_state=42, max_depth=5, min_samples_split=50, min_samples_leaf=k).fit(X_train, y_train)

    y_preds_train = regressor.predict(X_train)
    y_preds_val = regressor.predict(X_val)

    train_mae.append(mean_absolute_error(y_train, y_preds_train))
    val_mae.append(mean_absolute_error(y_val, y_preds_val))

    train_mse.append(mean_squared_error(y_train, y_preds_train))
    val_mse.append(mean_squared_error(y_val, y_preds_val))

    train_rmse.append(mean_squared_error(y_train, y_preds_train, squared=False))
    val_rmse.append(mean_squared_error(y_val, y_preds_val, squared=False))

result_min_leaf = pd.DataFrame({'train_mae': train_mae,
                                 'test_mae': val_mae,
                                 'train_mse': train_mse,
                                 'test_mse': val_mse,
                                 'train_rmse': train_rmse,
                                 'test_rmse': val_rmse}, index=min_samples_leaf)
result_min_leaf
Out[399]:
train_mae test_mae train_mse test_mse train_rmse test_rmse
2 2434.628170 2495.077908 1.234958e+07 1.434912e+07 3514.197533 3788.023115
5 2400.117670 2497.152084 1.252384e+07 1.462245e+07 3538.903512 3823.930808
10 2529.828746 2693.498737 1.408535e+07 1.772781e+07 3753.045990 4210.440371
20 2542.741510 2787.395009 1.481632e+07 1.640833e+07 3849.197790 4050.719254
50 3023.045614 2977.038460 1.974248e+07 1.712419e+07 4443.251073 4138.138037
100 4096.456639 3858.048355 3.413148e+07 3.090315e+07 5842.215653 5559.060013
150 4303.294377 4110.614686 3.558026e+07 3.377878e+07 5964.918713 5811.951725
200 5393.094725 4675.050604 4.920576e+07 3.827844e+07 7014.681598 6186.957172
In [400]:
plot_performance(min_samples_leaf, 'Number of min_samples_leaf')

min_samples_leaf = 2 appears to be the best choice¶

Reduce Overfitting with max_features (max_depth=10, min_samples_split=10, min_samples_leaf=20)¶

In [401]:
max_features = list(range(1, 40))

train_mae = []
val_mae = []
train_mse = []
val_mse = []
train_rmse = []
val_rmse = []

for k in max_features:
    regressor = DecisionTreeRegressor(random_state=42, max_depth=5, min_samples_split=50, min_samples_leaf=2, max_features=k).fit(X_train, y_train)

    y_preds_train = regressor.predict(X_train)
    y_preds_val = regressor.predict(X_val)

    train_mae.append(mean_absolute_error(y_train, y_preds_train))
    val_mae.append(mean_absolute_error(y_val, y_preds_val))

    train_mse.append(mean_squared_error(y_train, y_preds_train))
    val_mse.append(mean_squared_error(y_val, y_preds_val))

    train_rmse.append(mean_squared_error(y_train, y_preds_train, squared=False))
    val_rmse.append(mean_squared_error(y_val, y_preds_val, squared=False))

result_max_features = pd.DataFrame({'train_mae': train_mae,
                                 'test_mae': val_mae,
                                 'train_mse': train_mse,
                                 'test_mse': val_mse,
                                 'train_rmse': train_rmse,
                                 'test_rmse': val_rmse}, index=max_features)
result_max_features
Out[401]:
train_mae test_mae train_mse test_mse train_rmse test_rmse
1 4264.657260 4616.025908 3.212019e+07 3.590294e+07 5667.467929 5991.906389
2 3881.229772 4051.035539 2.541137e+07 3.018227e+07 5040.968720 5493.839657
3 3426.687014 3721.874845 2.364337e+07 2.306265e+07 4862.445246 4802.358837
4 3097.223558 3580.844608 1.866593e+07 2.397298e+07 4320.408298 4896.220703
5 3202.852552 3238.518236 2.015210e+07 2.089132e+07 4489.109075 4570.702049
6 3106.749747 3474.954972 1.910198e+07 2.435943e+07 4370.580762 4935.527600
7 2728.941066 2887.399486 1.673508e+07 1.758785e+07 4090.853564 4193.786896
8 2666.172170 3081.998005 1.515382e+07 1.999276e+07 3892.790373 4471.326484
9 2664.665506 3068.606964 1.523465e+07 1.875990e+07 3903.159341 4331.269661
10 2415.864738 2343.437093 1.281677e+07 1.175229e+07 3580.051743 3428.161970
11 2654.738472 2897.205908 1.495893e+07 1.681862e+07 3867.677397 4101.051515
12 2675.450560 3075.612277 1.679244e+07 1.990135e+07 4097.858027 4461.092723
13 2484.843948 2889.408818 1.352989e+07 1.920853e+07 3678.300071 4382.753376
14 2595.096972 3068.558237 1.430120e+07 1.838980e+07 3781.693126 4288.333164
15 2718.399858 3016.266256 1.549448e+07 1.918380e+07 3936.303111 4379.931101
16 2582.769468 2934.521535 1.614190e+07 1.652437e+07 4017.697807 4065.017308
17 2530.412028 2864.433139 1.385684e+07 1.625640e+07 3722.477498 4031.922083
18 2549.521050 2871.871144 1.324068e+07 1.791743e+07 3638.775045 4232.898402
19 2685.614858 2787.644288 1.566936e+07 1.919058e+07 3958.454428 4380.705239
20 2690.517661 2755.131712 1.593639e+07 1.592006e+07 3992.040610 3989.994524
21 2713.216617 2790.841853 1.680501e+07 1.661129e+07 4099.390921 4075.695516
22 2519.717968 2625.160921 1.336965e+07 1.485764e+07 3656.452782 3854.561401
23 2632.505031 2772.088734 1.505643e+07 1.744229e+07 3880.261976 4176.397070
24 2664.922298 3066.453710 1.623056e+07 1.739398e+07 4028.716796 4170.609643
25 2598.182466 2894.304121 1.510566e+07 1.545932e+07 3886.599621 3931.834754
26 2486.253606 2802.219441 1.275504e+07 1.889001e+07 3571.420374 4346.264177
27 2476.739796 2941.197604 1.267193e+07 2.079016e+07 3559.765707 4559.622518
28 2747.288799 2951.437696 1.719211e+07 1.743517e+07 4146.337260 4175.543852
29 2639.839210 2920.397189 1.610752e+07 1.798974e+07 4013.417492 4241.431158
30 2635.788221 2859.073720 1.488086e+07 1.725833e+07 3857.572241 4154.314315
31 2768.473489 2814.408148 1.718205e+07 1.580990e+07 4145.123584 3976.166184
32 2429.230821 2776.043621 1.260990e+07 1.723332e+07 3551.041823 4151.303390
33 2464.360038 2912.959982 1.308792e+07 2.071590e+07 3617.723629 4551.472378
34 2608.391222 3237.114353 1.397910e+07 2.308207e+07 3738.863757 4804.380732
35 2510.333978 2790.691469 1.327830e+07 1.607366e+07 3643.940532 4009.197502
36 2489.379869 2396.316350 1.250927e+07 1.174200e+07 3536.844009 3426.660796
37 2556.300489 2602.215159 1.308516e+07 1.471822e+07 3617.341451 3836.433054
38 2429.550738 2956.865052 1.267424e+07 2.228661e+07 3560.089871 4720.870151
39 2479.682919 2686.939230 1.285882e+07 1.576122e+07 3585.919783 3970.039915
In [402]:
plot_performance(max_features, 'Number of max_features')

max_features = 10 appears to be the best choice¶

Reduce Overfitting with criterion (max_depth=10, min_samples_split=10, min_samples_leaf=20, max_features=10)¶

In [403]:
criterion = ['absolute_error', 'squared_error', 'friedman_mse', 'poisson']

train_mae = []
val_mae = []
train_mse = []
val_mse = []
train_rmse = []
val_rmse = []

for k in criterion:
    regressor = DecisionTreeRegressor(random_state=42, max_depth=5, min_samples_split=50, min_samples_leaf=5, max_features=10, criterion=k).fit(X_train, y_train)

    y_preds_train = regressor.predict(X_train)
    y_preds_val = regressor.predict(X_val)

    train_mae.append(mean_absolute_error(y_train, y_preds_train))
    val_mae.append(mean_absolute_error(y_val, y_preds_val))

    train_mse.append(mean_squared_error(y_train, y_preds_train))
    val_mse.append(mean_squared_error(y_val, y_preds_val))

    train_rmse.append(mean_squared_error(y_train, y_preds_train, squared=False))
    val_rmse.append(mean_squared_error(y_val, y_preds_val, squared=False))

result_criterion = pd.DataFrame({'train_mae': train_mae,
                                 'test_mae': val_mae,
                                 'train_mse': train_mse,
                                 'test_mse': val_mse,
                                 'train_rmse': train_rmse,
                                 'test_rmse': val_rmse}, index=criterion)
result_criterion
Out[403]:
train_mae test_mae train_mse test_mse train_rmse test_rmse
absolute_error 2538.360879 2528.292131 1.766422e+07 1.374484e+07 4202.881995 3707.403884
squared_error 2428.632887 2377.058551 1.302080e+07 1.166857e+07 3608.435026 3415.929063
friedman_mse 2428.632887 2377.058551 1.302080e+07 1.166857e+07 3608.435026 3415.929063
poisson 2696.912025 2604.426164 1.551771e+07 1.462207e+07 3939.252978 3823.881632
In [404]:
plot_performance(criterion, 'Criterion function')

After Hyperparameter tuning¶

Decision Tree Regressor (After tuning)

In [405]:
import altair as alt

def decision_tree_regression_tuning(X_train, y_train, X_test, y_test, index_train, index_test):
    regressor = DecisionTreeRegressor(random_state=42, max_depth=5, min_samples_split=50, min_samples_leaf=5, max_features=10, criterion='squared_error').fit(X_train, y_train)

    y_preds_train = regressor.predict(X_train)
    y_preds_test = regressor.predict(X_test)

    mse = mean_squared_error(y_train, y_preds_train)
    mae = mean_absolute_error(y_train, y_preds_train)
    rmse = mean_squared_error(y_train, y_preds_train, squared=False)

    dt_train = pd.DataFrame({'mae': mae,
                             'mse': mse,
                             'rmse': rmse},
                            index=[index_train])

    mse = mean_squared_error(y_test, y_preds_test)
    mae = mean_absolute_error(y_test, y_preds_test)
    rmse = mean_squared_error(y_test, y_preds_test, squared=False)

    dt_test = pd.DataFrame({'mae': mae,
                            'mse': mse,
                            'rmse': rmse},
                           index=[index_test])

    dt_models = pd.concat([dt_train, dt_test])

    fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(24, 6))
    axes[0].set_title(f'Decision Tree: {index_train}')
    plot_tree(regressor, ax=axes[0], filled=True)

    axes[1].set_title(f'Decision  Tree: {index_test}')
    plot_tree(regressor, ax=axes[1], filled=True)

    plt.tight_layout()
    plt.show()

    feature_importances = regressor.feature_importances_
    sorted_indices = np.argsort(feature_importances)[::-1]
    
    feature_names = X_train.columns.values
    
    sorted_feature_importances = feature_importances[sorted_indices]
    sorted_feature_names = feature_names[sorted_indices]
    
    plt.figure(figsize=(6, 20))
    plt.barh(range(len(sorted_feature_importances)), sorted_feature_importances[::-1], align='center')
    plt.yticks(range(len(sorted_feature_importances)), sorted_feature_names[::-1])
    plt.xlabel("Feature Importance")
    plt.ylabel("Features")
    plt.title("Decision Tree Regressor - Feature Importance")
    plt.show()

    return dt_models

1. Decision Tree Regression with All Features¶

In [406]:
dt_tuning = decision_tree_regression_tuning(X_train, y_train, X_val, y_val, 'DT_Tune_Train', 'DT_Tune_Val')
dt_tuning
Out[406]:
mae mse rmse
DT_Tune_Train 2428.632887 1.302080e+07 3608.435026
DT_Tune_Val 2377.058551 1.166857e+07 3415.929063

2. Decision Tree Regression with Feature Enginerring 1 (Correlation > 0.5)¶

In [407]:
dt_feature1_tuning = decision_tree_regression_tuning(X_train_feature1, y_train, X_val_feature1, y_val, 'DT_Tune_FEATURE1_Train', 'DT_Tune_FEATURE1_Val')
dt_feature1_tuning
Out[407]:
mae mse rmse
DT_Tune_FEATURE1_Train 2634.547381 1.511255e+07 3887.486135
DT_Tune_FEATURE1_Val 2843.669581 1.846317e+07 4296.879298

3. Decision Tree Regression with Feature Enginerring 1 (Correlation > 0.7)¶

In [408]:
dt_feature2_tuning = decision_tree_regression_tuning(X_train_feature2, y_train, X_val_feature2, y_val, 'DT_Tune_FEATURE2_Train', 'DT_Tune_FEATURE2_Val')
dt_feature2_tuning
Out[408]:
mae mse rmse
DT_Tune_FEATURE2_Train 2640.609970 1.566922e+07 3958.435982
DT_Tune_FEATURE2_Val 2532.673206 1.454901e+07 3814.316998

4. Decision Tree Regression with Feature Enginerring 1 (Correlation > 0.8)¶

In [409]:
dt_feature3_tuning = decision_tree_regression_tuning(X_train_feature3, y_train, X_val_feature3, y_val, 'DT_Tune_FEATURE3_Train', 'DT_Tune_FEATURE3_Val')
dt_feature3_tuning
Out[409]:
mae mse rmse
DT_Tune_FEATURE3_Train 2634.547381 1.511255e+07 3887.486135
DT_Tune_FEATURE3_Val 2855.303550 1.849346e+07 4300.402255
In [410]:
dt_model = pd.concat([dt, dt_feature1, dt_feature2, dt_feature3, dt_tuning, dt_feature1_tuning, dt_feature2_tuning, dt_feature3_tuning])
dt_model
Out[410]:
mae mse rmse
Dtregressor_Train 0.000000 0.000000e+00 0.000000
Dtregressor_Val 3022.997923 1.981871e+07 4451.821535
Dtregressor_FEATURE1_Train 0.000000 0.000000e+00 0.000000
Dtregressor_FEATURE1_Val 3142.906175 2.105998e+07 4589.114999
Dtregressor_FEATURE2_Train 0.000000 0.000000e+00 0.000000
Dtregressor_FEATURE2_Val 3379.599290 2.756409e+07 5250.151896
Dtregressor_FEATURE3_Train 0.000000 0.000000e+00 0.000000
Dtregressor_FEATURE3_Val 3090.758525 2.071248e+07 4551.096100
DT_Tune_Train 2428.632887 1.302080e+07 3608.435026
DT_Tune_Val 2377.058551 1.166857e+07 3415.929063
DT_Tune_FEATURE1_Train 2634.547381 1.511255e+07 3887.486135
DT_Tune_FEATURE1_Val 2843.669581 1.846317e+07 4296.879298
DT_Tune_FEATURE2_Train 2640.609970 1.566922e+07 3958.435982
DT_Tune_FEATURE2_Val 2532.673206 1.454901e+07 3814.316998
DT_Tune_FEATURE3_Train 2634.547381 1.511255e+07 3887.486135
DT_Tune_FEATURE3_Val 2855.303550 1.849346e+07 4300.402255

Based on feature importance:

Top 5 Features

  • total_amt', Period2022-08_M: Total amount spending in 2022-08
  • total_count', Period2022-12_M: Total Transaction in 2022-12
  • total_amt', Period2022-02_M: Total amount spending in 2022-02
  • total_count', Period2022-01_M: Total transaction in 2022-01
  • total_amt', Period2022-05_M: Total amount spending in 2022-05

VI. Random Forest Regression¶

In [411]:
from sklearn.ensemble import RandomForestRegressor

Random Forest Function (Default Hyperparameter)¶

In [412]:
def random_forest_regressor(X_train, y_train, X_test, y_test, index_train, index_test):
    regressor = RandomForestRegressor(random_state=42).fit(X_train, y_train)
    
    y_preds_train = regressor.predict(X_train)
    y_preds_test = regressor.predict(X_test)

    mse = mean_squared_error(y_train, y_preds_train)
    mae = mean_absolute_error(y_train, y_preds_train)
    rmse = mean_squared_error(y_train, y_preds_train, squared=False)

    rf_train = pd.DataFrame({'mae': mae,
                             'mse': mse,
                             'rmse': rmse},
                            index=[index_train])

    mse = mean_squared_error(y_test, y_preds_test)
    mae = mean_absolute_error(y_test, y_preds_test)
    rmse = mean_squared_error(y_test, y_preds_test, squared=False)

    rf_test = pd.DataFrame({'mae': mae,
                            'mse': mse,
                            'rmse': rmse},
                           index=[index_test])
    
    dt_models = pd.concat([rf_train, rf_test])
    
    feature_importances = regressor.feature_importances_
    sorted_indices = np.argsort(feature_importances)[::-1]
    
    feature_names = X_train.columns.values
    
    sorted_feature_importances = feature_importances[sorted_indices]
    sorted_feature_names = feature_names[sorted_indices]
    
    plt.figure(figsize=(6, 20))
    plt.barh(range(len(sorted_feature_importances)), sorted_feature_importances[::-1], align='center')
    plt.yticks(range(len(sorted_feature_importances)), sorted_feature_names[::-1])
    plt.xlabel("Feature Importance")
    plt.ylabel("Features")
    plt.title("Random Forest Regressor - Feature Importance")
    plt.show()
    
    return dt_models

1. Random Forest Regressor with All Features¶

In [413]:
rf = random_forest_regressor(X_train, y_train, X_val, y_val, 'RFregressor_Train', 'RFregressor_Val')
rf
Out[413]:
mae mse rmse
RFregressor_Train 952.337505 2.045942e+06 1430.364188
RFregressor_Val 2203.343703 9.785882e+06 3128.239434

2. Random Forest Regressor with Feature Enginerring 1 (Correlation > 0.5)¶

In [414]:
rf_feature1 = random_forest_regressor(X_train_feature1, y_train, X_val_feature1, y_val, 'RFregressor_FEATURE1_Train', 'RFregressor_FEATURE1_Val')
rf_feature1
Out[414]:
mae mse rmse
RFregressor_FEATURE1_Train 969.471965 2.116189e+06 1454.712785
RFregressor_FEATURE1_Val 2187.242154 9.871982e+06 3141.970979

3. Random Forest Regressor with Feature Enginerring 2 (Correlation > 0.7)¶

In [415]:
rf_feature2 = random_forest_regressor(X_train_feature2, y_train, X_val_feature2, y_val, 'RFregressor_FEATURE2_Train', 'RFregressor_FEATURE2_Val')
rf_feature2
Out[415]:
mae mse rmse
RFregressor_FEATURE2_Train 954.917457 2.064596e+06 1436.870097
RFregressor_FEATURE2_Val 2206.446608 9.923375e+06 3150.138845

4. Random Forest Regressor with Feature Enginerring 3 (Correlation > 0.8)¶

In [416]:
rf_feature3= random_forest_regressor(X_train_feature3, y_train, X_val_feature3, y_val, 'RFregressor_FEATURE3_Train', 'RFregressor_FEATURE3_Val')
rf_feature3
Out[416]:
mae mse rmse
RFregressor_FEATURE3_Train 969.307015 2.117008e+06 1454.994201
RFregressor_FEATURE3_Val 2190.343906 9.880833e+06 3143.379168

Hyperparameter Tuning¶

Reduce Overfitting with n_estimators¶

In [417]:
n_estimators = [2, 5, 10, 20, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500]

train_mae = []
val_mae = []
train_mse = []
val_mse = []
train_rmse = []
val_rmse = []

for k in n_estimators:
    regressor = RandomForestRegressor(random_state=42, n_estimators=k).fit(X_train, y_train)

    y_preds_train = regressor.predict(X_train)
    y_preds_val = regressor.predict(X_val)

    train_mae.append(mean_absolute_error(y_train, y_preds_train))
    val_mae.append(mean_absolute_error(y_val, y_preds_val))

    train_mse.append(mean_squared_error(y_train, y_preds_train))
    val_mse.append(mean_squared_error(y_val, y_preds_val))

    train_rmse.append(mean_squared_error(y_train, y_preds_train, squared=False))
    val_rmse.append(mean_squared_error(y_val, y_preds_val, squared=False))

result_n_estimators = pd.DataFrame({'train_mae': train_mae,
                                 'test_mae': val_mae,
                                 'train_mse': train_mse,
                                 'test_mse': val_mse,
                                 'train_rmse': train_rmse,
                                 'test_rmse': val_rmse}, index=n_estimators)
result_n_estimators
Out[417]:
train_mae test_mae train_mse test_mse train_rmse test_rmse
2 1242.610916 2987.065738 6.745545e+06 1.691260e+07 2597.218708 4112.493373
5 1126.597799 2532.092481 3.257686e+06 1.255253e+07 1804.906078 3542.955581
10 1011.604971 2413.340169 2.431891e+06 1.166108e+07 1559.452098 3414.832983
20 992.655842 2264.938874 2.275209e+06 1.027619e+07 1508.379544 3205.649030
50 982.490668 2226.859317 2.143529e+06 9.939897e+06 1464.079453 3152.760198
100 952.337505 2203.343703 2.045942e+06 9.785882e+06 1430.364188 3128.239434
150 949.291605 2213.789471 2.042899e+06 9.824783e+06 1429.300119 3134.451066
200 944.937466 2229.819886 1.999551e+06 9.881873e+06 1414.054829 3143.544727
250 951.241607 2211.912820 2.018628e+06 9.801909e+06 1420.784127 3130.800055
300 954.003582 2232.031193 2.038388e+06 9.906177e+06 1427.721104 3147.407939
350 958.054202 2234.438987 2.072641e+06 9.914890e+06 1439.666916 3148.791886
400 957.339041 2230.991642 2.069329e+06 9.874852e+06 1438.516095 3142.427782
450 954.585142 2231.985573 2.064817e+06 9.862552e+06 1436.947271 3140.470014
500 953.531756 2228.501206 2.059465e+06 9.859875e+06 1435.083565 3140.043856
In [418]:
plot_performance(n_estimators, 'Number of n_estimators')

The best n_estimators seems to be 100

Reduce Overfitting with max_depth¶

In [419]:
max_depth = [2, 5, 10, 20, 50, 100, 150, 200, None]

train_mae = []
val_mae = []
train_mse = []
val_mse = []
train_rmse = []
val_rmse = []

for k in max_depth:
    regressor = RandomForestRegressor(random_state=42, n_estimators=100, max_depth=k).fit(X_train, y_train)

    y_preds_train = regressor.predict(X_train)
    y_preds_val = regressor.predict(X_val)

    train_mae.append(mean_absolute_error(y_train, y_preds_train))
    val_mae.append(mean_absolute_error(y_val, y_preds_val))

    train_mse.append(mean_squared_error(y_train, y_preds_train))
    val_mse.append(mean_squared_error(y_val, y_preds_val))

    train_rmse.append(mean_squared_error(y_train, y_preds_train, squared=False))
    val_rmse.append(mean_squared_error(y_val, y_preds_val, squared=False))

result_max_depth = pd.DataFrame({'train_mae': train_mae,
                                 'test_mae': val_mae,
                                 'train_mse': train_mse,
                                 'test_mse': val_mse,
                                 'train_rmse': train_rmse,
                                 'test_rmse': val_rmse}, index=max_depth)
result_max_depth
Out[419]:
train_mae test_mae train_mse test_mse train_rmse test_rmse
2.0 2853.453438 2754.565884 1.660048e+07 1.400695e+07 4074.368843 3742.585952
5.0 1690.756481 2245.768725 5.765749e+06 9.835034e+06 2401.197430 3136.085835
10.0 1005.398492 2173.776943 2.175859e+06 9.752327e+06 1475.079361 3122.871615
20.0 952.083485 2201.015083 2.039902e+06 9.757870e+06 1428.251283 3123.758967
50.0 952.337505 2203.343703 2.045942e+06 9.785882e+06 1430.364188 3128.239434
100.0 952.337505 2203.343703 2.045942e+06 9.785882e+06 1430.364188 3128.239434
150.0 952.337505 2203.343703 2.045942e+06 9.785882e+06 1430.364188 3128.239434
200.0 952.337505 2203.343703 2.045942e+06 9.785882e+06 1430.364188 3128.239434
NaN 952.337505 2203.343703 2.045942e+06 9.785882e+06 1430.364188 3128.239434
In [420]:
plot_performance(max_depth, 'Number of max_depth')

The best max_depth is 10

Reduce Overfitting with min_samples_split¶

In [421]:
min_sample_split = [2, 5, 10, 20, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500]

train_mae = []
val_mae = []
train_mse = []
val_mse = []
train_rmse = []
val_rmse = []

for k in min_sample_split:
    regressor = RandomForestRegressor(random_state=42, n_estimators=100, max_depth=10, min_samples_split=k).fit(X_train, y_train)

    y_preds_train = regressor.predict(X_train)
    y_preds_val = regressor.predict(X_val)

    train_mae.append(mean_absolute_error(y_train, y_preds_train))
    val_mae.append(mean_absolute_error(y_val, y_preds_val))

    train_mse.append(mean_squared_error(y_train, y_preds_train))
    val_mse.append(mean_squared_error(y_val, y_preds_val))

    train_rmse.append(mean_squared_error(y_train, y_preds_train, squared=False))
    val_rmse.append(mean_squared_error(y_val, y_preds_val, squared=False))

result_min_split = pd.DataFrame({'train_mae': train_mae,
                                 'test_mae': val_mae,
                                 'train_mse': train_mse,
                                 'test_mse': val_mse,
                                 'train_rmse': train_rmse,
                                 'test_rmse': val_rmse}, index=min_sample_split)
result_min_split
Out[421]:
train_mae test_mae train_mse test_mse train_rmse test_rmse
2 1005.398492 2173.776943 2.175859e+06 9.752327e+06 1475.079361 3122.871615
5 1082.368610 2210.049998 2.674528e+06 9.757091e+06 1635.398376 3123.634214
10 1259.059300 2199.634626 3.666149e+06 9.799558e+06 1914.719060 3130.424535
20 1551.093075 2218.976044 5.563476e+06 1.016624e+07 2358.702243 3188.454215
50 2127.672054 2385.840523 1.062035e+07 1.120086e+07 3258.888244 3346.769003
100 2764.945928 2620.879736 1.731910e+07 1.446973e+07 4161.621958 3803.910447
150 3403.335781 3160.560872 2.565301e+07 2.200975e+07 5064.880152 4691.454519
200 3421.191917 3190.272747 2.573637e+07 2.232729e+07 5073.102927 4725.176182
250 3682.532220 3524.576999 2.841669e+07 2.477537e+07 5330.730588 4977.486548
300 4345.721384 4182.219041 3.478046e+07 3.153400e+07 5897.495915 5615.514061
350 7083.538618 7509.107319 8.436794e+07 8.566889e+07 9185.202492 9255.749141
400 8309.393546 8723.341999 1.105863e+08 1.118168e+08 10516.003668 10574.347796
450 8309.393546 8723.341999 1.105863e+08 1.118168e+08 10516.003668 10574.347796
500 8309.393546 8723.341999 1.105863e+08 1.118168e+08 10516.003668 10574.347796
In [422]:
plot_performance(min_sample_split, 'Number of min_samples_split')

The best min_samples_split = 50

Reduce Overfitting with min_samples_leaf¶

In [423]:
min_samples_leaf = [2, 5, 10, 20, 50, 100, 150, 200]

train_mae = []
val_mae = []
train_mse = []
val_mse = []
train_rmse = []
val_rmse = []

for k in min_samples_leaf:
    regressor = RandomForestRegressor(random_state=42, n_estimators=100, max_depth=10, min_samples_split=50, min_samples_leaf=k).fit(X_train, y_train)

    y_preds_train = regressor.predict(X_train)
    y_preds_val = regressor.predict(X_val)

    train_mae.append(mean_absolute_error(y_train, y_preds_train))
    val_mae.append(mean_absolute_error(y_val, y_preds_val))

    train_mse.append(mean_squared_error(y_train, y_preds_train))
    val_mse.append(mean_squared_error(y_val, y_preds_val))

    train_rmse.append(mean_squared_error(y_train, y_preds_train, squared=False))
    val_rmse.append(mean_squared_error(y_val, y_preds_val, squared=False))

result_min_leaf = pd.DataFrame({'train_mae': train_mae,
                                 'test_mae': val_mae,
                                 'train_mse': train_mse,
                                 'test_mse': val_mse,
                                 'train_rmse': train_rmse,
                                 'test_rmse': val_rmse}, index=min_samples_leaf)
result_min_leaf
Out[423]:
train_mae test_mae train_mse test_mse train_rmse test_rmse
2 2141.827379 2404.326973 1.089825e+07 1.124311e+07 3301.249606 3353.075266
5 2163.526677 2390.473206 1.124620e+07 1.119163e+07 3353.536053 3345.389088
10 2204.406785 2410.624136 1.170962e+07 1.135133e+07 3421.932374 3369.173041
20 2287.403270 2442.456604 1.288107e+07 1.176767e+07 3589.020227 3430.404233
50 2831.516772 2692.189520 1.881492e+07 1.592891e+07 4337.617113 3991.104111
100 3654.767197 3382.997994 2.830094e+07 2.419499e+07 5319.863147 4918.840419
150 4771.574255 4476.755346 4.276172e+07 3.719275e+07 6539.244253 6098.585612
200 8309.393546 8723.341999 1.105863e+08 1.118168e+08 10516.003668 10574.347796
In [424]:
plot_performance(min_samples_leaf, 'Number of min_samples_leaf')

The best min_samples_leaf = 5

Reduce Overfitting with max_features¶

In [425]:
max_features = list(range(1, 100))

train_mae = []
val_mae = []
train_mse = []
val_mse = []
train_rmse = []
val_rmse = []

for k in max_features:
    regressor = RandomForestRegressor(random_state=42, n_estimators=100, max_depth=10, min_samples_split=50, min_samples_leaf=5, max_features=k).fit(X_train, y_train)

    y_preds_train = regressor.predict(X_train)
    y_preds_val = regressor.predict(X_val)

    train_mae.append(mean_absolute_error(y_train, y_preds_train))
    val_mae.append(mean_absolute_error(y_val, y_preds_val))

    train_mse.append(mean_squared_error(y_train, y_preds_train))
    val_mse.append(mean_squared_error(y_val, y_preds_val))

    train_rmse.append(mean_squared_error(y_train, y_preds_train, squared=False))
    val_rmse.append(mean_squared_error(y_val, y_preds_val, squared=False))

result_max_features = pd.DataFrame({'train_mae': train_mae,
                                 'test_mae': val_mae,
                                 'train_mse': train_mse,
                                 'test_mse': val_mse,
                                 'train_rmse': train_rmse,
                                 'test_rmse': val_rmse}, index=max_features)
result_max_features
Out[425]:
train_mae test_mae train_mse test_mse train_rmse test_rmse
1 3477.675432 3591.978705 2.279570e+07 2.285192e+07 4774.484725 4780.368513
2 2996.785590 3130.866184 1.822634e+07 1.775541e+07 4269.231262 4213.717345
3 2697.977342 2804.672885 1.565209e+07 1.409895e+07 3956.271881 3754.857489
4 2563.007738 2634.722391 1.469556e+07 1.278529e+07 3833.478616 3575.652727
5 2455.740146 2482.778473 1.371981e+07 1.141160e+07 3704.025706 3378.106174
6 2395.197031 2415.635187 1.363482e+07 1.134146e+07 3692.535814 3367.708762
7 2365.982644 2405.957880 1.286377e+07 1.087528e+07 3586.610014 3297.768640
8 2357.595785 2358.718720 1.292670e+07 1.091105e+07 3595.372040 3303.187279
9 2339.322455 2363.001359 1.303042e+07 1.093239e+07 3609.767030 3306.416751
10 2288.021607 2313.586664 1.237356e+07 1.044550e+07 3517.606962 3231.950528
11 2303.571198 2345.275577 1.246210e+07 1.075239e+07 3530.170657 3279.083880
12 2263.198193 2324.972135 1.207774e+07 1.048390e+07 3475.304881 3237.885858
13 2261.675395 2321.685589 1.198885e+07 1.046523e+07 3462.491523 3235.000632
14 2284.428986 2320.541422 1.230656e+07 1.049975e+07 3508.070968 3240.331834
15 2259.455941 2324.687848 1.224559e+07 1.073812e+07 3499.369916 3276.906946
16 2269.028731 2324.564421 1.229451e+07 1.072337e+07 3506.352529 3274.655712
17 2269.294456 2320.041288 1.222199e+07 1.086599e+07 3495.995571 3296.360812
18 2248.662305 2321.096855 1.203000e+07 1.044803e+07 3468.428812 3232.341157
19 2258.168483 2356.655551 1.212543e+07 1.063474e+07 3482.159199 3261.095514
20 2233.110780 2367.705899 1.175932e+07 1.072561e+07 3429.186113 3274.998333
21 2265.113922 2304.691750 1.215032e+07 1.045364e+07 3485.731510 3233.209403
22 2240.266854 2335.620006 1.167223e+07 1.029991e+07 3416.464697 3209.348049
23 2224.690100 2318.560599 1.165417e+07 1.041771e+07 3413.820121 3227.648191
24 2230.372806 2337.121687 1.177480e+07 1.052904e+07 3431.442778 3244.848900
25 2219.945368 2362.964875 1.163162e+07 1.057774e+07 3410.515973 3252.343348
26 2240.318404 2344.560924 1.190931e+07 1.064153e+07 3450.986971 3262.135552
27 2227.629365 2353.467212 1.163935e+07 1.058192e+07 3411.648580 3252.986321
28 2220.995455 2334.881538 1.164116e+07 1.060648e+07 3411.914842 3256.758744
29 2219.492127 2359.982153 1.166252e+07 1.060287e+07 3415.043270 3256.205141
30 2222.899028 2355.995655 1.160157e+07 1.064918e+07 3406.107307 3263.307880
31 2229.859567 2347.962188 1.180031e+07 1.046774e+07 3435.158096 3235.388406
32 2206.067967 2345.454912 1.154127e+07 1.044915e+07 3397.244430 3232.515139
33 2220.604089 2363.516672 1.174859e+07 1.066493e+07 3427.622217 3265.720755
34 2210.573180 2350.170072 1.160315e+07 1.078372e+07 3406.339071 3283.857896
35 2201.999336 2375.997269 1.147708e+07 1.075874e+07 3387.783453 3280.051217
36 2225.451652 2329.382373 1.152563e+07 1.040532e+07 3394.942263 3225.727923
37 2199.660309 2329.480813 1.143512e+07 1.032473e+07 3381.584958 3213.212111
38 2211.380415 2330.893425 1.164872e+07 1.047942e+07 3413.022021 3237.193675
39 2220.664702 2321.428615 1.150099e+07 1.040542e+07 3391.311257 3225.742935
40 2190.897092 2349.662527 1.134702e+07 1.059901e+07 3368.533575 3255.611436
41 2208.679685 2380.184891 1.151432e+07 1.065604e+07 3393.275092 3264.359068
42 2195.172664 2363.248364 1.140986e+07 1.073709e+07 3377.848963 3276.749864
43 2197.798555 2375.819634 1.153330e+07 1.085194e+07 3396.071856 3294.228784
44 2206.237803 2335.165620 1.142035e+07 1.031231e+07 3379.400114 3211.277963
45 2202.357096 2369.670373 1.137317e+07 1.047362e+07 3372.412354 3236.297323
46 2200.605109 2350.466337 1.143111e+07 1.067409e+07 3380.991769 3267.122643
47 2202.805817 2345.531062 1.150952e+07 1.052906e+07 3392.568808 3244.851283
48 2196.774474 2343.252537 1.122175e+07 1.040008e+07 3349.888185 3224.915151
49 2202.828910 2339.605899 1.147040e+07 1.052662e+07 3386.797245 3244.475442
50 2195.341509 2375.389586 1.136672e+07 1.057528e+07 3371.456005 3251.965190
51 2193.798781 2357.174145 1.164871e+07 1.076966e+07 3413.020457 3281.716336
52 2192.617729 2349.609988 1.140911e+07 1.072870e+07 3377.737272 3275.468680
53 2201.018179 2361.040949 1.129995e+07 1.055747e+07 3361.539568 3249.226062
54 2167.359910 2358.374427 1.130564e+07 1.060692e+07 3362.385532 3256.826985
55 2186.617296 2350.329180 1.124623e+07 1.038978e+07 3353.539184 3223.317915
56 2175.129498 2334.029587 1.118126e+07 1.057609e+07 3343.839064 3252.089977
57 2184.533111 2379.362957 1.138006e+07 1.092604e+07 3373.435177 3305.456699
58 2206.831104 2376.727396 1.149347e+07 1.088292e+07 3390.201464 3298.927807
59 2167.928291 2352.624917 1.125031e+07 1.083861e+07 3354.147797 3292.204874
60 2190.262508 2340.763859 1.134159e+07 1.043096e+07 3367.728188 3229.699255
61 2186.351162 2398.651876 1.142174e+07 1.102970e+07 3379.606801 3321.099699
62 2189.000472 2368.537020 1.134646e+07 1.096177e+07 3368.450007 3310.856729
63 2196.230296 2392.547030 1.146475e+07 1.082730e+07 3385.963834 3290.486357
64 2191.070341 2364.543376 1.136020e+07 1.069429e+07 3370.489140 3270.212545
65 2189.735705 2375.952124 1.133595e+07 1.066541e+07 3366.890040 3265.793211
66 2179.877695 2404.505752 1.133625e+07 1.092867e+07 3366.934461 3305.853970
67 2181.647404 2381.641043 1.142899e+07 1.089971e+07 3380.678248 3301.470690
68 2200.062800 2383.520937 1.143584e+07 1.089394e+07 3381.691252 3300.597042
69 2193.185785 2372.937359 1.127669e+07 1.082489e+07 3358.077660 3290.120568
70 2179.722317 2378.917742 1.122143e+07 1.112525e+07 3349.840668 3335.453866
71 2185.913439 2396.766372 1.137832e+07 1.104587e+07 3373.176330 3323.532864
72 2170.520913 2384.924205 1.133528e+07 1.124914e+07 3366.790663 3353.973971
73 2180.124054 2390.532155 1.130259e+07 1.122649e+07 3361.932889 3350.594885
74 2187.528284 2402.264045 1.154054e+07 1.130720e+07 3397.137680 3362.618521
75 2165.658546 2389.996353 1.113513e+07 1.112608e+07 3336.933546 3335.578032
76 2178.107152 2409.681037 1.149347e+07 1.140646e+07 3390.202349 3377.344740
77 2168.237897 2401.050385 1.122791e+07 1.124049e+07 3350.807655 3352.684650
78 2181.473126 2403.966264 1.123609e+07 1.118304e+07 3352.027612 3344.105868
79 2182.974023 2391.513480 1.141083e+07 1.122284e+07 3377.991582 3350.050479
80 2175.044720 2391.072793 1.135441e+07 1.106569e+07 3369.629886 3326.513206
81 2189.078315 2420.074593 1.153326e+07 1.147696e+07 3396.064673 3387.765747
82 2171.294464 2397.885922 1.120208e+07 1.127856e+07 3346.951439 3358.356700
83 2179.697016 2378.928047 1.132295e+07 1.108011e+07 3364.959350 3328.680407
84 2151.080439 2384.881926 1.116961e+07 1.122427e+07 3342.095926 3350.264479
85 2179.423692 2396.384202 1.132146e+07 1.116307e+07 3364.737009 3341.118718
86 2177.231377 2395.471641 1.127447e+07 1.111767e+07 3357.748201 3334.316927
87 2181.048281 2379.886606 1.124974e+07 1.094590e+07 3354.063443 3308.458634
88 2184.524816 2367.938702 1.133174e+07 1.095595e+07 3366.264618 3309.976676
89 2177.634308 2348.983069 1.129787e+07 1.070280e+07 3361.231017 3271.513334
90 2181.626819 2373.402474 1.130044e+07 1.102033e+07 3361.613222 3319.688136
91 2178.033375 2385.980360 1.124835e+07 1.109387e+07 3353.856274 3330.746192
92 2171.328467 2386.442445 1.123837e+07 1.100789e+07 3352.367142 3317.814101
93 2176.888403 2391.573655 1.137194e+07 1.120430e+07 3372.230536 3347.282741
94 2174.701624 2385.790997 1.129514e+07 1.105274e+07 3360.824846 3324.566042
95 2163.637111 2368.454600 1.120318e+07 1.100253e+07 3347.114556 3317.006576
96 2175.740832 2375.865642 1.127626e+07 1.102723e+07 3358.014426 3320.727813
97 2160.552088 2386.074411 1.124433e+07 1.111819e+07 3353.256709 3334.395394
98 2167.347134 2374.431595 1.133766e+07 1.115557e+07 3367.143626 3339.995541
99 2165.591054 2384.505743 1.126598e+07 1.116858e+07 3356.483474 3341.941808
In [426]:
result_max_features
Out[426]:
train_mae test_mae train_mse test_mse train_rmse test_rmse
1 3477.675432 3591.978705 2.279570e+07 2.285192e+07 4774.484725 4780.368513
2 2996.785590 3130.866184 1.822634e+07 1.775541e+07 4269.231262 4213.717345
3 2697.977342 2804.672885 1.565209e+07 1.409895e+07 3956.271881 3754.857489
4 2563.007738 2634.722391 1.469556e+07 1.278529e+07 3833.478616 3575.652727
5 2455.740146 2482.778473 1.371981e+07 1.141160e+07 3704.025706 3378.106174
6 2395.197031 2415.635187 1.363482e+07 1.134146e+07 3692.535814 3367.708762
7 2365.982644 2405.957880 1.286377e+07 1.087528e+07 3586.610014 3297.768640
8 2357.595785 2358.718720 1.292670e+07 1.091105e+07 3595.372040 3303.187279
9 2339.322455 2363.001359 1.303042e+07 1.093239e+07 3609.767030 3306.416751
10 2288.021607 2313.586664 1.237356e+07 1.044550e+07 3517.606962 3231.950528
11 2303.571198 2345.275577 1.246210e+07 1.075239e+07 3530.170657 3279.083880
12 2263.198193 2324.972135 1.207774e+07 1.048390e+07 3475.304881 3237.885858
13 2261.675395 2321.685589 1.198885e+07 1.046523e+07 3462.491523 3235.000632
14 2284.428986 2320.541422 1.230656e+07 1.049975e+07 3508.070968 3240.331834
15 2259.455941 2324.687848 1.224559e+07 1.073812e+07 3499.369916 3276.906946
16 2269.028731 2324.564421 1.229451e+07 1.072337e+07 3506.352529 3274.655712
17 2269.294456 2320.041288 1.222199e+07 1.086599e+07 3495.995571 3296.360812
18 2248.662305 2321.096855 1.203000e+07 1.044803e+07 3468.428812 3232.341157
19 2258.168483 2356.655551 1.212543e+07 1.063474e+07 3482.159199 3261.095514
20 2233.110780 2367.705899 1.175932e+07 1.072561e+07 3429.186113 3274.998333
21 2265.113922 2304.691750 1.215032e+07 1.045364e+07 3485.731510 3233.209403
22 2240.266854 2335.620006 1.167223e+07 1.029991e+07 3416.464697 3209.348049
23 2224.690100 2318.560599 1.165417e+07 1.041771e+07 3413.820121 3227.648191
24 2230.372806 2337.121687 1.177480e+07 1.052904e+07 3431.442778 3244.848900
25 2219.945368 2362.964875 1.163162e+07 1.057774e+07 3410.515973 3252.343348
26 2240.318404 2344.560924 1.190931e+07 1.064153e+07 3450.986971 3262.135552
27 2227.629365 2353.467212 1.163935e+07 1.058192e+07 3411.648580 3252.986321
28 2220.995455 2334.881538 1.164116e+07 1.060648e+07 3411.914842 3256.758744
29 2219.492127 2359.982153 1.166252e+07 1.060287e+07 3415.043270 3256.205141
30 2222.899028 2355.995655 1.160157e+07 1.064918e+07 3406.107307 3263.307880
31 2229.859567 2347.962188 1.180031e+07 1.046774e+07 3435.158096 3235.388406
32 2206.067967 2345.454912 1.154127e+07 1.044915e+07 3397.244430 3232.515139
33 2220.604089 2363.516672 1.174859e+07 1.066493e+07 3427.622217 3265.720755
34 2210.573180 2350.170072 1.160315e+07 1.078372e+07 3406.339071 3283.857896
35 2201.999336 2375.997269 1.147708e+07 1.075874e+07 3387.783453 3280.051217
36 2225.451652 2329.382373 1.152563e+07 1.040532e+07 3394.942263 3225.727923
37 2199.660309 2329.480813 1.143512e+07 1.032473e+07 3381.584958 3213.212111
38 2211.380415 2330.893425 1.164872e+07 1.047942e+07 3413.022021 3237.193675
39 2220.664702 2321.428615 1.150099e+07 1.040542e+07 3391.311257 3225.742935
40 2190.897092 2349.662527 1.134702e+07 1.059901e+07 3368.533575 3255.611436
41 2208.679685 2380.184891 1.151432e+07 1.065604e+07 3393.275092 3264.359068
42 2195.172664 2363.248364 1.140986e+07 1.073709e+07 3377.848963 3276.749864
43 2197.798555 2375.819634 1.153330e+07 1.085194e+07 3396.071856 3294.228784
44 2206.237803 2335.165620 1.142035e+07 1.031231e+07 3379.400114 3211.277963
45 2202.357096 2369.670373 1.137317e+07 1.047362e+07 3372.412354 3236.297323
46 2200.605109 2350.466337 1.143111e+07 1.067409e+07 3380.991769 3267.122643
47 2202.805817 2345.531062 1.150952e+07 1.052906e+07 3392.568808 3244.851283
48 2196.774474 2343.252537 1.122175e+07 1.040008e+07 3349.888185 3224.915151
49 2202.828910 2339.605899 1.147040e+07 1.052662e+07 3386.797245 3244.475442
50 2195.341509 2375.389586 1.136672e+07 1.057528e+07 3371.456005 3251.965190
51 2193.798781 2357.174145 1.164871e+07 1.076966e+07 3413.020457 3281.716336
52 2192.617729 2349.609988 1.140911e+07 1.072870e+07 3377.737272 3275.468680
53 2201.018179 2361.040949 1.129995e+07 1.055747e+07 3361.539568 3249.226062
54 2167.359910 2358.374427 1.130564e+07 1.060692e+07 3362.385532 3256.826985
55 2186.617296 2350.329180 1.124623e+07 1.038978e+07 3353.539184 3223.317915
56 2175.129498 2334.029587 1.118126e+07 1.057609e+07 3343.839064 3252.089977
57 2184.533111 2379.362957 1.138006e+07 1.092604e+07 3373.435177 3305.456699
58 2206.831104 2376.727396 1.149347e+07 1.088292e+07 3390.201464 3298.927807
59 2167.928291 2352.624917 1.125031e+07 1.083861e+07 3354.147797 3292.204874
60 2190.262508 2340.763859 1.134159e+07 1.043096e+07 3367.728188 3229.699255
61 2186.351162 2398.651876 1.142174e+07 1.102970e+07 3379.606801 3321.099699
62 2189.000472 2368.537020 1.134646e+07 1.096177e+07 3368.450007 3310.856729
63 2196.230296 2392.547030 1.146475e+07 1.082730e+07 3385.963834 3290.486357
64 2191.070341 2364.543376 1.136020e+07 1.069429e+07 3370.489140 3270.212545
65 2189.735705 2375.952124 1.133595e+07 1.066541e+07 3366.890040 3265.793211
66 2179.877695 2404.505752 1.133625e+07 1.092867e+07 3366.934461 3305.853970
67 2181.647404 2381.641043 1.142899e+07 1.089971e+07 3380.678248 3301.470690
68 2200.062800 2383.520937 1.143584e+07 1.089394e+07 3381.691252 3300.597042
69 2193.185785 2372.937359 1.127669e+07 1.082489e+07 3358.077660 3290.120568
70 2179.722317 2378.917742 1.122143e+07 1.112525e+07 3349.840668 3335.453866
71 2185.913439 2396.766372 1.137832e+07 1.104587e+07 3373.176330 3323.532864
72 2170.520913 2384.924205 1.133528e+07 1.124914e+07 3366.790663 3353.973971
73 2180.124054 2390.532155 1.130259e+07 1.122649e+07 3361.932889 3350.594885
74 2187.528284 2402.264045 1.154054e+07 1.130720e+07 3397.137680 3362.618521
75 2165.658546 2389.996353 1.113513e+07 1.112608e+07 3336.933546 3335.578032
76 2178.107152 2409.681037 1.149347e+07 1.140646e+07 3390.202349 3377.344740
77 2168.237897 2401.050385 1.122791e+07 1.124049e+07 3350.807655 3352.684650
78 2181.473126 2403.966264 1.123609e+07 1.118304e+07 3352.027612 3344.105868
79 2182.974023 2391.513480 1.141083e+07 1.122284e+07 3377.991582 3350.050479
80 2175.044720 2391.072793 1.135441e+07 1.106569e+07 3369.629886 3326.513206
81 2189.078315 2420.074593 1.153326e+07 1.147696e+07 3396.064673 3387.765747
82 2171.294464 2397.885922 1.120208e+07 1.127856e+07 3346.951439 3358.356700
83 2179.697016 2378.928047 1.132295e+07 1.108011e+07 3364.959350 3328.680407
84 2151.080439 2384.881926 1.116961e+07 1.122427e+07 3342.095926 3350.264479
85 2179.423692 2396.384202 1.132146e+07 1.116307e+07 3364.737009 3341.118718
86 2177.231377 2395.471641 1.127447e+07 1.111767e+07 3357.748201 3334.316927
87 2181.048281 2379.886606 1.124974e+07 1.094590e+07 3354.063443 3308.458634
88 2184.524816 2367.938702 1.133174e+07 1.095595e+07 3366.264618 3309.976676
89 2177.634308 2348.983069 1.129787e+07 1.070280e+07 3361.231017 3271.513334
90 2181.626819 2373.402474 1.130044e+07 1.102033e+07 3361.613222 3319.688136
91 2178.033375 2385.980360 1.124835e+07 1.109387e+07 3353.856274 3330.746192
92 2171.328467 2386.442445 1.123837e+07 1.100789e+07 3352.367142 3317.814101
93 2176.888403 2391.573655 1.137194e+07 1.120430e+07 3372.230536 3347.282741
94 2174.701624 2385.790997 1.129514e+07 1.105274e+07 3360.824846 3324.566042
95 2163.637111 2368.454600 1.120318e+07 1.100253e+07 3347.114556 3317.006576
96 2175.740832 2375.865642 1.127626e+07 1.102723e+07 3358.014426 3320.727813
97 2160.552088 2386.074411 1.124433e+07 1.111819e+07 3353.256709 3334.395394
98 2167.347134 2374.431595 1.133766e+07 1.115557e+07 3367.143626 3339.995541
99 2165.591054 2384.505743 1.126598e+07 1.116858e+07 3356.483474 3341.941808
In [427]:
plot_performance(max_features, 'Number of max_features')

The best max_features = 95

After Hyperparameter Tuning¶

In [428]:
def random_forest_regressor_tuning(X_train, y_train, X_test, y_test, index_train, index_test):
    regressor = RandomForestRegressor(random_state=42, n_estimators=100, max_depth=10, min_samples_split=50, min_samples_leaf=5, max_features=95).fit(X_train, y_train)
    
    y_preds_train = regressor.predict(X_train)
    y_preds_test = regressor.predict(X_test)

    mse = mean_squared_error(y_train, y_preds_train)
    mae = mean_absolute_error(y_train, y_preds_train)
    rmse = mean_squared_error(y_train, y_preds_train, squared=False)

    rf_train = pd.DataFrame({'mae': mae,
                             'mse': mse,
                             'rmse': rmse},
                            index=[index_train])

    mse = mean_squared_error(y_test, y_preds_test)
    mae = mean_absolute_error(y_test, y_preds_test)
    rmse = mean_squared_error(y_test, y_preds_test, squared=False)

    rf_test = pd.DataFrame({'mae': mae,
                            'mse': mse,
                            'rmse': rmse},
                           index=[index_test])
    
    dt_models = pd.concat([rf_train, rf_test])
    
    feature_importances = regressor.feature_importances_
    sorted_indices = np.argsort(feature_importances)[::-1]
    
    feature_names = X_train.columns.values
    
    sorted_feature_importances = feature_importances[sorted_indices]
    sorted_feature_names = feature_names[sorted_indices]
    
    plt.figure(figsize=(6, 20))
    plt.barh(range(len(sorted_feature_importances)), sorted_feature_importances[::-1], align='center')
    plt.yticks(range(len(sorted_feature_importances)), sorted_feature_names[::-1])
    plt.xlabel("Feature Importance")
    plt.ylabel("Features")
    plt.title("Random Forest Regressor - Feature Importance")
    plt.show()
    
    return dt_models

1. Random Forest Regressor with All Features¶

In [429]:
rf_tuning = random_forest_regressor_tuning(X_train, y_train, X_val, y_val, 'RF_Tune_Train', 'RF_Tune_Val')
rf_tuning
Out[429]:
mae mse rmse
RF_Tune_Train 2163.637111 1.120318e+07 3347.114556
RF_Tune_Val 2368.454600 1.100253e+07 3317.006576

2. Random Forest Regressor with Feature Engineering 1 (Correlation > 0.5)¶

In [430]:
rf_feature1_tuning = random_forest_regressor_tuning(X_train_feature1, y_train, X_val_feature1, y_val, 'RF_Tune_FEATURE1_Train', 'RF_Tune_FEATURE1_Val')
rf_feature1_tuning
Out[430]:
mae mse rmse
RF_Tune_FEATURE1_Train 2217.155423 1.176883e+07 3430.572444
RF_Tune_FEATURE1_Val 2406.069268 1.134822e+07 3368.712046

3. Random Forest Regressor with Feature Engineering 2 (Correlation > 0.7)¶

In [431]:
rf_feature2_tuning = random_forest_regressor_tuning(X_train_feature2, y_train, X_val_feature2, y_val, 'RF_Tune_FEATURE2_Train', 'RF_Tune_FEATURE2_Val')
rf_feature2_tuning
Out[431]:
mae mse rmse
RF_Tune_FEATURE2_Train 2162.109950 1.127638e+07 3358.032533
RF_Tune_FEATURE2_Val 2389.063839 1.112963e+07 3336.109354

4. Random Forest Regressor with Feature Engineering 3 (Correlation > 0.8)¶

In [432]:
rf_feature3_tuning = random_forest_regressor_tuning(X_train_feature3, y_train, X_val_feature3, y_val, 'RF_Tune_FEATURE3_Train', 'RF_Tune_FEATURE3_Val')
rf_feature3_tuning
Out[432]:
mae mse rmse
RF_Tune_FEATURE3_Train 2216.333066 1.176684e+07 3430.283430
RF_Tune_FEATURE3_Val 2403.261234 1.134797e+07 3368.674639
In [433]:
rf_model = pd.concat([rf, rf_feature1, rf_feature2, rf_feature3, rf_tuning, rf_feature1_tuning, rf_feature2_tuning, rf_feature3_tuning])
rf_model
Out[433]:
mae mse rmse
RFregressor_Train 952.337505 2.045942e+06 1430.364188
RFregressor_Val 2203.343703 9.785882e+06 3128.239434
RFregressor_FEATURE1_Train 969.471965 2.116189e+06 1454.712785
RFregressor_FEATURE1_Val 2187.242154 9.871982e+06 3141.970979
RFregressor_FEATURE2_Train 954.917457 2.064596e+06 1436.870097
RFregressor_FEATURE2_Val 2206.446608 9.923375e+06 3150.138845
RFregressor_FEATURE3_Train 969.307015 2.117008e+06 1454.994201
RFregressor_FEATURE3_Val 2190.343906 9.880833e+06 3143.379168
RF_Tune_Train 2163.637111 1.120318e+07 3347.114556
RF_Tune_Val 2368.454600 1.100253e+07 3317.006576
RF_Tune_FEATURE1_Train 2217.155423 1.176883e+07 3430.572444
RF_Tune_FEATURE1_Val 2406.069268 1.134822e+07 3368.712046
RF_Tune_FEATURE2_Train 2162.109950 1.127638e+07 3358.032533
RF_Tune_FEATURE2_Val 2389.063839 1.112963e+07 3336.109354
RF_Tune_FEATURE3_Train 2216.333066 1.176684e+07 3430.283430
RF_Tune_FEATURE3_Val 2403.261234 1.134797e+07 3368.674639

VII. XGBoost¶

In [434]:
from sklearn.ensemble import GradientBoostingRegressor

XGBoost Function¶

In [435]:
def xgboost(X_train, y_train, X_test, y_test, index_train, index_test):
    xgboost_reg = GradientBoostingRegressor().fit(X_train, y_train)
    
    y_preds_train = xgboost_reg.predict(X_train)
    y_preds_test = xgboost_reg.predict(X_test)

    mse = mean_squared_error(y_preds_train, y_train)
    mae = mean_absolute_error(y_preds_train, y_train)
    rmse = mean_squared_error(y_preds_train, y_train, squared=False)

    xgboost_train = pd.DataFrame({'mae': mae,
                                  'mse': mse,
                                  'rmse': rmse},
                                 index=[index_train])

    mse = mean_squared_error(y_preds_test, y_test)
    mae = mean_absolute_error(y_preds_test, y_test)
    rmse = mean_squared_error(y_preds_test, y_test, squared=False)

    xgboost_test = pd.DataFrame({'mae': mae,
                                 'mse': mse,
                                 'rmse': rmse},
                                index=[index_test])

    fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(16, 6))

    # Plot the predicted vs actual target values for the training set
    axes[0].plot(y_train, y_preds_train, 'o',
                 color='orange', label='Predictions')
    axes[0].plot(y_train, y_train, '-', color='red', label='Actual')

    axes[0].set_xlabel('Actual')
    axes[0].set_ylabel('Predicted')
    axes[0].set_title(
        f'{index_train}: Comparison of Actual vs. Predicted Target')

    axes[0].legend()

    axes[1].plot(y_test, y_preds_test, 'o',
                 color='orange', label='Predictions')
    axes[1].plot(y_test, y_test, '-', color='red', label='Actual')

    axes[1].set_xlabel('Actual')
    axes[1].set_ylabel('Predicted')
    axes[1].set_title(
        f'{index_test}: Comparison of Actual vs. Predicted Target')

    axes[1].legend()

    plt.show()

    xgboost_models = pd.concat([xgboost_train, xgboost_test])

    return xgboost_models

1. XGBoost with All Features¶

In [436]:
xg = xgboost(X_train, y_train, X_val, y_val, 'XGBoost_Train', 'XGBoost_Val')
xg
Out[436]:
mae mse rmse
XGBoost_Train 1062.784899 2.039445e+06 1428.091324
XGBoost_Val 2370.002169 1.150793e+07 3392.334072

2. XGBoost Feature Engineering 1 (Correlation >= 0.5)¶

In [437]:
xg_feature1 = xgboost(X_train_feature1, y_train, X_val_feature1, y_val, 'XGBoost_FEATURE1_Train', 'XGBoost_FEATURE1_Val')
xg_feature1
Out[437]:
mae mse rmse
XGBoost_FEATURE1_Train 1251.945221 2.949687e+06 1717.465394
XGBoost_FEATURE1_Val 2381.406002 1.206589e+07 3473.598919

3. XGBoost Feature Engineering 2 (Correlation >= 0.7)¶

In [438]:
xg_feature2 = xgboost(X_train_feature2, y_train, X_val_feature2, y_val, 'XGBoost_FEATURE2_Train', 'XGBoost_FEATURE2_Val')
xg_feature2
Out[438]:
mae mse rmse
XGBoost_FEATURE2_Train 1114.508146 2.326707e+06 1525.354768
XGBoost_FEATURE2_Val 2330.471706 1.189117e+07 3448.357665

4. XGBoost Feature Engineering 3 (Correlation >= 0.8)¶

In [439]:
xg_feature3 = xgboost(X_train_feature3, y_train, X_val_feature3, y_val, 'XGBoost_FEATURE3_Train', 'XGBoost_FEATURE3_Val')
xg_feature3
Out[439]:
mae mse rmse
XGBoost_FEATURE3_Train 1251.945221 2.949687e+06 1717.465394
XGBoost_FEATURE3_Val 2390.399367 1.221464e+07 3494.944348

Reduce Overfitting with Learning Rate(eta)¶

In [440]:
learning_rate = [0.001, 0.01, 0.1, 0.15, 0.2, 0.3, 0.4, 0.5, 1]

train_mae = []
val_mae = []
train_mse = []
val_mse = []
train_rmse = []
val_rmse = []

for k in learning_rate:
    regressor = GradientBoostingRegressor(random_state=42, learning_rate=k).fit(X_train, y_train)

    y_preds_train = regressor.predict(X_train)
    y_preds_val = regressor.predict(X_val)

    train_mae.append(mean_absolute_error(y_train, y_preds_train))
    val_mae.append(mean_absolute_error(y_val, y_preds_val))

    train_mse.append(mean_squared_error(y_train, y_preds_train))
    val_mse.append(mean_squared_error(y_val, y_preds_val))

    train_rmse.append(mean_squared_error(y_train, y_preds_train, squared=False))
    val_rmse.append(mean_squared_error(y_val, y_preds_val, squared=False))

result_learning_rate = pd.DataFrame({'train_mae': train_mae,
                                 'test_mae': val_mae,
                                 'train_mse': train_mse,
                                 'test_mse': val_mse,
                                 'train_rmse': train_rmse,
                                 'test_rmse': val_rmse}, index=learning_rate)
result_learning_rate
Out[440]:
train_mae test_mae train_mse test_mse train_rmse test_rmse
0.001 7637.367424 7955.366567 9.355801e+07 9.258699e+07 9672.538934 9622.213450
0.010 3905.271359 3931.880791 2.514555e+07 2.310171e+07 5014.533708 4806.423923
0.100 1062.784899 2370.547474 2.039445e+06 1.118598e+07 1428.091324 3344.544623
0.150 774.505422 2394.947704 1.058870e+06 1.093955e+07 1029.014099 3307.499695
0.200 577.790777 2377.679801 5.425335e+05 1.116542e+07 736.568745 3341.469584
0.300 313.369595 2315.334644 1.605535e+05 1.056805e+07 400.691277 3250.854066
0.400 207.376570 2513.281874 7.047907e+04 1.281325e+07 265.478941 3579.559796
0.500 109.283164 2778.266649 2.058349e+04 1.437993e+07 143.469464 3792.087221
1.000 14.395943 3540.087158 3.462054e+02 2.565066e+07 18.606595 5064.647788
In [441]:
plot_performance(learning_rate, 'Number of learning_rate')

The best learning_rate = 0.3

Reduce Overfitting with n_estimators¶

In [442]:
n_estimators = [2, 5, 10, 20, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500]

train_mae = []
val_mae = []
train_mse = []
val_mse = []
train_rmse = []
val_rmse = []

for k in n_estimators:
    regressor = GradientBoostingRegressor(random_state=42, learning_rate=0.3, n_estimators=k).fit(X_train, y_train)

    y_preds_train = regressor.predict(X_train)
    y_preds_val = regressor.predict(X_val)

    train_mae.append(mean_absolute_error(y_train, y_preds_train))
    val_mae.append(mean_absolute_error(y_val, y_preds_val))

    train_mse.append(mean_squared_error(y_train, y_preds_train))
    val_mse.append(mean_squared_error(y_val, y_preds_val))

    train_rmse.append(mean_squared_error(y_train, y_preds_train, squared=False))
    val_rmse.append(mean_squared_error(y_val, y_preds_val, squared=False))

result_n_estimators = pd.DataFrame({'train_mae': train_mae,
                                 'test_mae': val_mae,
                                 'train_mse': train_mse,
                                 'test_mse': val_mse,
                                 'train_rmse': train_rmse,
                                 'test_rmse': val_rmse}, index=n_estimators)
result_n_estimators
Out[442]:
train_mae test_mae train_mse test_mse train_rmse test_rmse
2 4809.319733 4902.595605 3.754147e+07 3.491928e+07 6127.109831 5909.253875
5 2722.114794 2678.412655 1.323165e+07 1.176303e+07 3637.533961 3429.727588
10 1897.562236 2159.119324 7.202624e+06 9.456199e+06 2683.770470 3075.093266
20 1477.064331 2203.857035 4.099596e+06 9.593891e+06 2024.745920 3097.400617
50 792.346707 2321.985863 9.871235e+05 1.065937e+07 993.540893 3264.868751
100 313.369595 2315.334644 1.605535e+05 1.056805e+07 400.691277 3250.854066
150 144.448363 2310.388526 3.341922e+04 1.052438e+07 182.809255 3244.130283
200 65.244271 2314.904790 6.547823e+03 1.055840e+07 80.918621 3249.369306
250 28.543625 2317.728432 1.360793e+03 1.056667e+07 36.888930 3250.641373
300 13.864842 2318.692472 3.140040e+02 1.057643e+07 17.720157 3252.141681
350 6.197603 2318.552079 6.283897e+01 1.057725e+07 7.927104 3252.269032
400 3.028791 2318.410223 1.456341e+01 1.057707e+07 3.816203 3252.240054
450 1.490880 2318.125249 3.513121e+00 1.057522e+07 1.874332 3251.955903
500 0.736964 2318.156858 8.794494e-01 1.057561e+07 0.937790 3252.016988
In [443]:
plot_performance(n_estimators, 'Number of n_estimators')

The best n_estimators = 10

Reduce Overfitting with max_depth¶

In [444]:
max_depth = list(range(1, 20))

train_mae = []
val_mae = []
train_mse = []
val_mse = []
train_rmse = []
val_rmse = []

for k in max_depth:
    regressor = GradientBoostingRegressor(random_state=42, learning_rate=0.3, n_estimators=10, max_depth=k).fit(X_train, y_train)

    y_preds_train = regressor.predict(X_train)
    y_preds_val = regressor.predict(X_val)

    train_mae.append(mean_absolute_error(y_train, y_preds_train))
    val_mae.append(mean_absolute_error(y_val, y_preds_val))

    train_mse.append(mean_squared_error(y_train, y_preds_train))
    val_mse.append(mean_squared_error(y_val, y_preds_val))

    train_rmse.append(mean_squared_error(y_train, y_preds_train, squared=False))
    val_rmse.append(mean_squared_error(y_val, y_preds_val, squared=False))

result_max_depth = pd.DataFrame({'train_mae': train_mae,
                                 'test_mae': val_mae,
                                 'train_mse': train_mse,
                                 'test_mse': val_mse,
                                 'train_rmse': train_rmse,
                                 'test_rmse': val_rmse}, index=max_depth)
result_max_depth
Out[444]:
train_mae test_mae train_mse test_mse train_rmse test_rmse
1 3033.378689 3030.005878 1.813618e+07 1.607499e+07 4258.659469 4009.362335
2 2245.290935 2456.658308 1.015635e+07 1.192099e+07 3186.902536 3452.678622
3 1897.562236 2159.119324 7.202624e+06 9.456199e+06 2683.770470 3075.093266
4 1419.249862 2416.253854 3.876618e+06 1.078335e+07 1968.912975 3283.801910
5 999.995619 2472.558868 1.678759e+06 1.165968e+07 1295.669407 3414.628081
6 640.556450 2481.935044 6.283118e+05 1.286504e+07 792.661203 3586.787037
7 470.013470 2493.233621 3.410014e+05 1.262973e+07 583.953221 3553.832539
8 335.950780 2656.258139 1.634064e+05 1.408870e+07 404.235563 3753.492209
9 295.677782 2662.952255 1.278064e+05 1.531712e+07 357.500253 3913.709262
10 248.782725 2598.521394 9.675634e+04 1.453755e+07 311.056815 3812.814487
11 245.523077 2826.931898 9.516235e+04 1.689388e+07 308.483949 4110.216132
12 237.048890 2809.766082 9.032624e+04 1.737428e+07 300.543237 4168.246205
13 234.917294 2787.765145 8.917618e+04 1.630613e+07 298.623812 4038.084888
14 234.685055 2852.724386 8.861161e+04 1.695581e+07 297.677025 4117.743422
15 234.664120 3032.458092 8.834857e+04 1.933811e+07 297.234878 4397.511641
16 234.664120 2972.875571 8.829941e+04 1.938559e+07 297.152159 4402.907360
17 234.664120 3000.691313 8.826067e+04 1.981977e+07 297.086973 4451.939605
18 234.664120 2987.046102 8.826392e+04 1.863146e+07 297.092440 4316.417172
19 234.664120 3026.466533 8.825935e+04 1.977414e+07 297.084759 4446.812458
In [445]:
plot_performance(max_depth, 'Number of max_depth')

max_depth = 3

Reduce Overfitting with min_samples_split¶

In [446]:
min_sample_split = [2, 5, 10, 20, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500]

train_mae = []
val_mae = []
train_mse = []
val_mse = []
train_rmse = []
val_rmse = []

for k in min_sample_split:
    regressor = GradientBoostingRegressor(random_state=42, learning_rate=0.3, n_estimators=10, max_depth=3, min_samples_split=k).fit(X_train, y_train)

    y_preds_train = regressor.predict(X_train)
    y_preds_val = regressor.predict(X_val)

    train_mae.append(mean_absolute_error(y_train, y_preds_train))
    val_mae.append(mean_absolute_error(y_val, y_preds_val))

    train_mse.append(mean_squared_error(y_train, y_preds_train))
    val_mse.append(mean_squared_error(y_val, y_preds_val))

    train_rmse.append(mean_squared_error(y_train, y_preds_train, squared=False))
    val_rmse.append(mean_squared_error(y_val, y_preds_val, squared=False))

result_min_split = pd.DataFrame({'train_mae': train_mae,
                                 'test_mae': val_mae,
                                 'train_mse': train_mse,
                                 'test_mse': val_mse,
                                 'train_rmse': train_rmse,
                                 'test_rmse': val_rmse}, index=min_sample_split)
result_min_split
Out[446]:
train_mae test_mae train_mse test_mse train_rmse test_rmse
2 1897.562236 2159.119324 7.202624e+06 9.456199e+06 2683.770470 3075.093266
5 1898.441015 2159.119324 7.276317e+06 9.456199e+06 2697.464861 3075.093266
10 1883.888592 2223.960670 7.101234e+06 1.012880e+07 2664.814129 3182.577178
20 1889.719239 2221.183396 7.194737e+06 1.012559e+07 2682.300669 3182.073600
50 1948.157575 2225.500680 7.593369e+06 1.022215e+07 2755.606806 3197.209912
100 2096.057554 2384.481232 9.258531e+06 1.087457e+07 3042.783404 3297.660784
150 2146.909054 2377.424572 9.811386e+06 1.117483e+07 3132.313169 3342.877288
200 2216.788479 2464.222510 1.054611e+07 1.134667e+07 3247.477312 3368.481882
250 2265.416683 2447.021771 1.106094e+07 1.088560e+07 3325.799465 3299.333507
300 2314.863814 2506.663830 1.128385e+07 1.183549e+07 3359.143898 3440.274393
350 2392.093288 2610.630323 1.176835e+07 1.231941e+07 3430.502935 3509.902016
400 2434.891181 2498.018701 1.269111e+07 1.218749e+07 3562.458562 3491.059159
450 2603.960333 2718.318055 1.388002e+07 1.292833e+07 3725.590335 3595.598472
500 2783.687584 2816.446267 1.549812e+07 1.390165e+07 3936.765194 3728.490992
In [447]:
plot_performance(min_sample_split, 'Number of min_samples_split')

The best min_samples_split = 5

Reduce Overfitting with min_samples_leaf¶

In [448]:
min_samples_leaf = [1, 2, 5, 10, 20, 50, 100, 150, 200]

train_mae = []
val_mae = []
train_mse = []
val_mse = []
train_rmse = []
val_rmse = []

for k in min_samples_leaf:
    regressor = GradientBoostingRegressor(random_state=42, learning_rate=0.3, n_estimators=10, max_depth=3, min_samples_split=5, min_samples_leaf=k).fit(X_train, y_train)

    y_preds_train = regressor.predict(X_train)
    y_preds_val = regressor.predict(X_val)

    train_mae.append(mean_absolute_error(y_train, y_preds_train))
    val_mae.append(mean_absolute_error(y_val, y_preds_val))

    train_mse.append(mean_squared_error(y_train, y_preds_train))
    val_mse.append(mean_squared_error(y_val, y_preds_val))

    train_rmse.append(mean_squared_error(y_train, y_preds_train, squared=False))
    val_rmse.append(mean_squared_error(y_val, y_preds_val, squared=False))

result_min_leaf = pd.DataFrame({'train_mae': train_mae,
                                 'test_mae': val_mae,
                                 'train_mse': train_mse,
                                 'test_mse': val_mse,
                                 'train_rmse': train_rmse,
                                 'test_rmse': val_rmse}, index=min_samples_leaf)
result_min_leaf
Out[448]:
train_mae test_mae train_mse test_mse train_rmse test_rmse
1 1898.441015 2159.119324 7.276317e+06 9.456199e+06 2697.464861 3075.093266
2 1874.827835 2189.431910 7.268464e+06 9.714303e+06 2696.008869 3116.777656
5 1901.049412 2228.578556 7.681601e+06 1.003300e+07 2771.570045 3167.491331
10 1892.191028 2315.660004 7.659879e+06 1.107284e+07 2767.648591 3327.588306
20 2045.176632 2403.469490 8.953731e+06 1.128570e+07 2992.278621 3359.420099
50 2209.278612 2502.178592 1.136276e+07 1.180240e+07 3370.869384 3435.462618
100 2555.035352 2634.764948 1.544620e+07 1.431501e+07 3930.165889 3783.517729
150 3086.782447 3017.239913 2.107417e+07 1.943793e+07 4590.661456 4408.846623
200 3768.490867 3503.482440 2.872838e+07 2.346540e+07 5359.886515 4844.109647
In [449]:
plot_performance(min_samples_leaf, 'Number of min_samples_leaf')

The best min_sampls_leaf = 1

Reduce Overfitting with max_features¶

In [450]:
max_features = list(range(1, 70))

train_mae = []
val_mae = []
train_mse = []
val_mse = []
train_rmse = []
val_rmse = []

for k in max_features:
    regressor = GradientBoostingRegressor(random_state=42, learning_rate=0.3, n_estimators=10, max_depth=3, min_samples_split=5, min_samples_leaf=1, max_features=k).fit(X_train, y_train)

    y_preds_train = regressor.predict(X_train)
    y_preds_val = regressor.predict(X_val)

    train_mae.append(mean_absolute_error(y_train, y_preds_train))
    val_mae.append(mean_absolute_error(y_val, y_preds_val))

    train_mse.append(mean_squared_error(y_train, y_preds_train))
    val_mse.append(mean_squared_error(y_val, y_preds_val))

    train_rmse.append(mean_squared_error(y_train, y_preds_train, squared=False))
    val_rmse.append(mean_squared_error(y_val, y_preds_val, squared=False))

result_max_features = pd.DataFrame({'train_mae': train_mae,
                                 'test_mae': val_mae,
                                 'train_mse': train_mse,
                                 'test_mse': val_mse,
                                 'train_rmse': train_rmse,
                                 'test_rmse': val_rmse}, index=max_features)
result_max_features
Out[450]:
train_mae test_mae train_mse test_mse train_rmse test_rmse
1 2992.389833 3547.188460 1.554257e+07 2.249611e+07 3942.406301 4743.005916
2 2574.093971 3019.681125 1.261069e+07 1.712248e+07 3551.153268 4137.931304
3 2275.514408 2477.041671 1.013244e+07 1.175872e+07 3183.149577 3429.098479
4 2298.232125 2802.968884 1.082636e+07 1.403492e+07 3290.343262 3746.321089
5 2237.959004 2657.553610 9.969061e+06 1.269945e+07 3157.382059 3563.629201
6 2106.562001 2458.637054 9.242476e+06 1.124718e+07 3040.144036 3353.681524
7 2037.578828 2583.834179 8.646991e+06 1.248260e+07 2940.576657 3533.072451
8 2011.943922 2393.731295 8.419335e+06 1.138662e+07 2901.608961 3374.407129
9 2038.764307 2407.997667 8.009854e+06 1.124592e+07 2830.168597 3353.494102
10 1998.737805 2206.331138 8.201365e+06 8.736292e+06 2863.802593 2955.721868
11 2014.367212 2482.064230 8.414228e+06 1.160916e+07 2900.728954 3407.221816
12 2085.881434 2330.761701 8.934988e+06 1.049240e+07 2989.145060 3239.197152
13 2061.457960 2271.215899 8.482537e+06 1.029450e+07 2912.479531 3208.504163
14 2003.339351 2311.735219 8.103362e+06 1.000137e+07 2846.640435 3162.494532
15 2013.889376 2496.186374 8.092926e+06 1.252522e+07 2844.806935 3539.098203
16 2058.052176 2203.388316 8.495790e+06 9.529086e+06 2914.753842 3086.921740
17 1939.063923 2396.947164 7.873419e+06 1.097980e+07 2805.961264 3313.577843
18 1982.051029 2398.589982 8.107846e+06 1.111559e+07 2847.427919 3334.005166
19 1953.171736 2450.729489 7.992664e+06 1.168225e+07 2827.129945 3417.930505
20 2000.091070 2543.037171 8.185593e+06 1.290967e+07 2861.047483 3593.002678
21 1961.152666 2309.144296 7.710684e+06 1.045117e+07 2776.811847 3232.826858
22 1943.273149 2366.941002 7.427542e+06 1.046259e+07 2725.351670 3234.592261
23 1957.332048 2426.631432 7.416235e+06 1.220879e+07 2723.276450 3494.107539
24 1912.032472 2367.153511 7.695077e+06 1.054375e+07 2774.000167 3247.113593
25 2016.562669 2421.440937 8.309338e+06 1.178134e+07 2882.592320 3432.395584
26 1952.019032 2477.070143 7.695946e+06 1.193328e+07 2774.156763 3454.458097
27 1958.096399 2349.317443 8.054867e+06 1.069258e+07 2838.109828 3269.950592
28 1940.208625 2528.815554 7.629295e+06 1.286331e+07 2762.117835 3586.545442
29 1965.596148 2401.075885 7.947390e+06 1.121367e+07 2819.111474 3348.681779
30 1969.630044 2448.475997 7.943748e+06 1.276058e+07 2818.465461 3572.195182
31 2020.749465 2416.029432 7.970437e+06 1.115745e+07 2823.196169 3340.276358
32 1919.364014 2439.772242 7.908414e+06 1.141210e+07 2812.190168 3378.180035
33 1909.791418 2514.905207 7.012936e+06 1.184286e+07 2648.194799 3441.345926
34 1906.001625 2389.177372 7.666696e+06 1.094239e+07 2768.879968 3307.928604
35 1953.780610 2420.255617 7.763947e+06 1.121581e+07 2786.385938 3349.001724
36 1986.557154 2277.525776 8.221187e+06 1.024731e+07 2867.261300 3201.141697
37 1896.047226 2399.734026 7.151915e+06 1.144981e+07 2674.306398 3383.756173
38 1897.733189 2278.857193 7.216185e+06 9.882588e+06 2686.295745 3143.658362
39 1919.316434 2530.867845 7.820896e+06 1.268273e+07 2796.586510 3561.282753
40 1913.376595 2346.555895 7.383016e+06 1.094623e+07 2717.170640 3308.508458
41 1823.386168 2365.026844 6.858351e+06 1.107708e+07 2618.845331 3328.224871
42 1907.930966 2272.248721 7.436039e+06 1.048804e+07 2726.910086 3238.524763
43 1904.423842 2515.061461 7.231355e+06 1.229140e+07 2689.117828 3505.908649
44 1950.866619 2493.243156 8.014659e+06 1.184041e+07 2831.017298 3440.989388
45 1923.274910 2517.585887 7.466743e+06 1.258081e+07 2732.534195 3546.944003
46 1844.362405 2434.269232 7.093039e+06 1.141351e+07 2663.275901 3378.388956
47 1916.309631 2348.536953 7.229358e+06 1.065626e+07 2688.746623 3264.392825
48 1931.464973 2428.566394 7.293784e+06 1.140836e+07 2700.700707 3377.626377
49 1923.513881 2387.288767 7.916913e+06 1.126563e+07 2813.700917 3356.431746
50 1879.004111 2481.924378 7.225423e+06 1.265682e+07 2688.014672 3557.642438
51 1896.152620 2454.064629 7.067025e+06 1.235377e+07 2658.387601 3514.792851
52 1906.914574 2488.217491 7.508954e+06 1.133260e+07 2740.246982 3366.393418
53 1904.590783 2461.580204 7.140712e+06 1.123760e+07 2672.211070 3352.252968
54 1908.189896 2288.106879 7.700862e+06 1.032449e+07 2775.042634 3213.174359
55 1960.302362 2336.407477 7.897687e+06 1.093127e+07 2810.282372 3306.247020
56 1925.309409 2519.647891 7.385809e+06 1.258542e+07 2717.684480 3547.592975
57 1859.983744 2365.354084 7.358989e+06 1.156973e+07 2712.745745 3401.429939
58 1911.345778 2376.792081 7.364006e+06 1.091945e+07 2713.670167 3304.458750
59 1882.376384 2521.514272 6.862256e+06 1.240455e+07 2619.590727 3522.009492
60 1848.518274 2474.497431 7.077519e+06 1.251163e+07 2660.360727 3537.178004
61 1921.708254 2441.146370 7.471476e+06 1.146756e+07 2733.400030 3386.378698
62 1884.062272 2457.097806 7.595409e+06 1.145473e+07 2755.976961 3384.484303
63 1882.603689 2428.898940 7.487282e+06 1.300150e+07 2736.289841 3605.759860
64 1869.619868 2325.320570 6.930000e+06 1.011610e+07 2632.489291 3180.581164
65 1901.388890 2388.071904 7.413743e+06 1.171192e+07 2722.818881 3422.267844
66 1887.903106 2399.645214 7.102355e+06 1.202512e+07 2665.024340 3467.725477
67 1867.639031 2389.641466 7.056586e+06 1.149900e+07 2656.423582 3391.017114
68 1913.423277 2290.361412 6.992975e+06 9.765529e+06 2644.423333 3124.984680
69 1858.173315 2463.036674 6.999651e+06 1.212339e+07 2645.685356 3481.865620
In [451]:
plot_performance(max_features, 'Number of max_features')

The best max_features = 10

After Hyperparameter Tuning¶

GradientBoosting Regressor function

In [452]:
def xgboost_tuning(X_train, y_train, X_test, y_test, index_train, index_test):
    xgboost_reg = GradientBoostingRegressor(random_state=42, learning_rate=0.3, n_estimators=10, max_depth=3, min_samples_split=5, min_samples_leaf=1, max_features=10).fit(X_train, y_train)
    
    y_preds_train = xgboost_reg.predict(X_train)
    y_preds_test = xgboost_reg.predict(X_test)

    mse = mean_squared_error(y_preds_train, y_train)
    mae = mean_absolute_error(y_preds_train, y_train)
    rmse = mean_squared_error(y_preds_train, y_train, squared=False)

    xgboost_train = pd.DataFrame({'mae': mae,
                                  'mse': mse,
                                  'rmse': rmse},
                                 index=[index_train])

    mse = mean_squared_error(y_preds_test, y_test)
    mae = mean_absolute_error(y_preds_test, y_test)
    rmse = mean_squared_error(y_preds_test, y_test, squared=False)

    xgboost_test = pd.DataFrame({'mae': mae,
                                 'mse': mse,
                                 'rmse': rmse},
                                index=[index_test])

    fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(16, 6))

    axes[0].plot(y_train, y_preds_train, 'o',
                 color='orange', label='Predictions')
    axes[0].plot(y_train, y_train, '-', color='red', label='Actual')

    axes[0].set_xlabel('Actual')
    axes[0].set_ylabel('Predicted')
    axes[0].set_title(
        f'{index_train}: Comparison of Actual vs. Predicted Target')

    axes[0].legend()

    axes[1].plot(y_test, y_preds_test, 'o',
                 color='orange', label='Predictions')
    axes[1].plot(y_test, y_test, '-', color='red', label='Actual')

    axes[1].set_xlabel('Actual')
    axes[1].set_ylabel('Predicted')
    axes[1].set_title(
        f'{index_test}: Comparison of Actual vs. Predicted Target')

    axes[1].legend()

    plt.show()

    xgboost_models = pd.concat([xgboost_train, xgboost_test])

    return xgboost_models

1. XGBoost with All Features¶

In [453]:
xg_tuning = xgboost_tuning(X_train, y_train, X_val, y_val, 'XGBoost_Tune_Train', 'XGBosst_Tune_Val')
xg_tuning
Out[453]:
mae mse rmse
XGBoost_Tune_Train 1998.737805 8.201365e+06 2863.802593
XGBosst_Tune_Val 2206.331138 8.736292e+06 2955.721868

2. XGBoost Feature Engineering 1 (Correlation >= 0.5)¶

In [454]:
xg_feature1_tuning = xgboost_tuning(X_train_feature1, y_train, X_val_feature1, y_val, 'XGBoost_Tune_FEATURE1_Train', 'XGBosst_Tune_FEATURE1_Val')
xg_feature1_tuning
Out[454]:
mae mse rmse
XGBoost_Tune_FEATURE1_Train 1987.072765 8.215540e+06 2866.276384
XGBosst_Tune_FEATURE1_Val 2301.942297 1.115216e+07 3339.484760

3. XGBoost Feature Engineering 2 (Correlation >= 0.7)¶

In [455]:
xg_feature2_tuning = xgboost_tuning(X_train_feature2, y_train, X_val_feature2, y_val, 'XGBoost_Tune_FEATURE2_Train', 'XGBosst_Tune_FEATURE2_Val')
xg_feature2_tuning
Out[455]:
mae mse rmse
XGBoost_Tune_FEATURE2_Train 1940.521103 7.589272e+06 2754.863299
XGBosst_Tune_FEATURE2_Val 2276.243249 1.131108e+07 3363.195538

4. XGBoost Feature Engineering 3 (Correlation >= 0.8)¶

In [456]:
xg_feature3_tuning = xgboost_tuning(X_train_feature3, y_train, X_val_feature3, y_val, 'XGBoost_Tune_FEATURE3_Train', 'XGBosst_Tune_FEATURE3_Val')
xg_feature3_tuning
Out[456]:
mae mse rmse
XGBoost_Tune_FEATURE3_Train 1987.072765 8.215540e+06 2866.276384
XGBosst_Tune_FEATURE3_Val 2306.776818 1.116378e+07 3341.225026

Randmoized Search¶

In [457]:
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import uniform, randint
In [458]:
param_grid = {
    'learning_rate': uniform(0.001, 0.3),
    'n_estimators': randint(2, 100),
    'max_depth': randint(1, 5),
    'min_samples_split': [2, 5, 10, 20, 50, 100],
    'min_samples_leaf': [1, 2, 5, 10, 20, 50, 100],
    'max_features': randint(1, 10)
}
In [459]:
gb_regressor = GradientBoostingRegressor()
In [460]:
random_search = RandomizedSearchCV(
    estimator=gb_regressor,
    param_distributions=param_grid,
    n_iter=10,
    scoring='neg_mean_squared_error',
    cv=5,
    random_state=42
)
In [461]:
random_search.fit(X_train, y_train)
Out[461]:
RandomizedSearchCV(cv=5, estimator=GradientBoostingRegressor(),
                   param_distributions={'learning_rate': <scipy.stats._distn_infrastructure.rv_continuous_frozen object at 0x386016980>,
                                        'max_depth': <scipy.stats._distn_infrastructure.rv_discrete_frozen object at 0x46ef06230>,
                                        'max_features': <scipy.stats._distn_infrastructure.rv_discrete_frozen object at 0x46ef074f0>,
                                        'min_samples_leaf': [1, 2, 5, 10, 20,
                                                             50, 100],
                                        'min_samples_split': [2, 5, 10, 20, 50,
                                                              100],
                                        'n_estimators': <scipy.stats._distn_infrastructure.rv_discrete_frozen object at 0x41bed3550>},
                   random_state=42, scoring='neg_mean_squared_error')
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RandomizedSearchCV(cv=5, estimator=GradientBoostingRegressor(),
                   param_distributions={'learning_rate': <scipy.stats._distn_infrastructure.rv_continuous_frozen object at 0x386016980>,
                                        'max_depth': <scipy.stats._distn_infrastructure.rv_discrete_frozen object at 0x46ef06230>,
                                        'max_features': <scipy.stats._distn_infrastructure.rv_discrete_frozen object at 0x46ef074f0>,
                                        'min_samples_leaf': [1, 2, 5, 10, 20,
                                                             50, 100],
                                        'min_samples_split': [2, 5, 10, 20, 50,
                                                              100],
                                        'n_estimators': <scipy.stats._distn_infrastructure.rv_discrete_frozen object at 0x41bed3550>},
                   random_state=42, scoring='neg_mean_squared_error')
GradientBoostingRegressor()
GradientBoostingRegressor()
In [462]:
best_params = random_search.best_params_
best_estimator = random_search.best_score_
In [463]:
best_params
Out[463]:
{'learning_rate': 0.12095829151457664,
 'max_depth': 4,
 'max_features': 3,
 'min_samples_leaf': 10,
 'min_samples_split': 20,
 'n_estimators': 65}
In [464]:
best_estimator
Out[464]:
-15708693.11887255
In [465]:
xg_random = GradientBoostingRegressor(**best_params).fit(X_train, y_train)
In [466]:
y_preds_train = xg_random.predict(X_train)
y_preds_val = xg_random.predict(X_val)

mse = mean_squared_error(y_preds_train, y_train)
mae = mean_absolute_error(y_preds_train, y_train)
rmse = mean_squared_error(y_preds_train, y_train, squared=False)

xgboost_random_train = pd.DataFrame({'mae': mae,
                              'mse': mse,
                              'rmse': rmse},
                             index=['XGBoost_Random_Training'])

mse = mean_squared_error(y_preds_val, y_val)
mae = mean_absolute_error(y_preds_val, y_val)
rmse = mean_squared_error(y_preds_val, y_val, squared=False)

xgboost_random_val = pd.DataFrame({'mae': mae,
                             'mse': mse,
                             'rmse': rmse},
                            index=['XGBoost_Random_Validating'])

xgboost_random = pd.concat([xgboost_random_train, xgboost_random_val])
xgboost_random
Out[466]:
mae mse rmse
XGBoost_Random_Training 1407.143534 4.645281e+06 2155.291443
XGBoost_Random_Validating 2326.201974 1.088273e+07 3298.897608
In [467]:
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(16, 6))


axes[0].plot(y_train, y_preds_train, 'o',
             color='orange', label='Predictions')
axes[0].plot(y_train, y_train, '-', color='red', label='Actual')

axes[0].set_xlabel('Actual')
axes[0].set_ylabel('Predicted')
axes[0].set_title('XGBoost_Random_Train: Comparison of Actual vs. Predicted Target')

axes[0].legend()

axes[1].plot(y_val, y_preds_val, 'o',
             color='orange', label='Predictions')
axes[1].plot(y_val, y_val, '-', color='red', label='Actual')

axes[1].set_xlabel('Actual')
axes[1].set_ylabel('Predicted')
axes[1].set_title('XGBoost_Random_Val: Comparison of Actual vs. Predicted Target')

axes[1].legend()
Out[467]:
<matplotlib.legend.Legend at 0x3867104c0>
In [468]:
xg_model = pd.concat([xg, xg_feature1, xg_feature2, xg_feature3, xg_tuning, xg_feature1_tuning, xg_feature2_tuning, xg_feature3_tuning, xgboost_random])
xg_model
Out[468]:
mae mse rmse
XGBoost_Train 1062.784899 2.039445e+06 1428.091324
XGBoost_Val 2370.002169 1.150793e+07 3392.334072
XGBoost_FEATURE1_Train 1251.945221 2.949687e+06 1717.465394
XGBoost_FEATURE1_Val 2381.406002 1.206589e+07 3473.598919
XGBoost_FEATURE2_Train 1114.508146 2.326707e+06 1525.354768
XGBoost_FEATURE2_Val 2330.471706 1.189117e+07 3448.357665
XGBoost_FEATURE3_Train 1251.945221 2.949687e+06 1717.465394
XGBoost_FEATURE3_Val 2390.399367 1.221464e+07 3494.944348
XGBoost_Tune_Train 1998.737805 8.201365e+06 2863.802593
XGBosst_Tune_Val 2206.331138 8.736292e+06 2955.721868
XGBoost_Tune_FEATURE1_Train 1987.072765 8.215540e+06 2866.276384
XGBosst_Tune_FEATURE1_Val 2301.942297 1.115216e+07 3339.484760
XGBoost_Tune_FEATURE2_Train 1940.521103 7.589272e+06 2754.863299
XGBosst_Tune_FEATURE2_Val 2276.243249 1.131108e+07 3363.195538
XGBoost_Tune_FEATURE3_Train 1987.072765 8.215540e+06 2866.276384
XGBosst_Tune_FEATURE3_Val 2306.776818 1.116378e+07 3341.225026
XGBoost_Random_Training 1407.143534 4.645281e+06 2155.291443
XGBoost_Random_Validating 2326.201974 1.088273e+07 3298.897608

Compare All models¶

1. Multilinear Regression¶

In [469]:
multi_model
Out[469]:
mae mse rmse
Baseline_Train 8307.422361 1.105862e+08 10515.995863
Baseline_Test 7751.203825 1.000556e+08 10002.777477
MultiLinear_Train 2079.423872 9.298125e+06 3049.282748
MultiLinear_Val 2267.698408 1.029542e+07 3208.647385
MultiLinear_Feature1_Train 2070.482628 9.344101e+06 3056.812263
MultiLinear_Feature1_Val 2230.827601 1.010146e+07 3178.280116
MultiLinear_Feature2_Train 2093.693198 1.030223e+07 3209.708360
MultiLinear_Feature2_Val 2123.415974 9.319224e+06 3052.740411
MultiLinear_Feature3_Train 2278.747762 1.223434e+07 3497.762131
MultiLinear_Feature3_Val 2124.895843 9.739218e+06 3120.772055

2. Lasso Regression¶

In [470]:
lasso_model
Out[470]:
mae mse rmse
Lasso_Train 2093.725380 9.353681e+06 3058.378796
Lasso_Val 2279.843057 1.024395e+07 3200.616371
Lasso_FEATURE1_Train 2278.114357 1.223520e+07 3497.885577
Lasso_FEATURE1_Val 2123.981346 9.719629e+06 3117.631893
Lasso_FEATURE2_Train 2092.325287 1.030345e+07 3209.898411
Lasso_FEATURE2_Val 2117.264295 9.286441e+06 3047.366207
Lasso_FEATURE3_Train 2279.940306 1.223561e+07 3497.943503
Lasso_FEATURE3_Val 2127.655356 9.759305e+06 3123.988600

3. Ridge Regression¶

In [471]:
ridge_model
Out[471]:
mae mse rmse
Ridge_Train 2061.966576 9.326546e+06 3053.939442
Ridge_Val 2211.667197 9.930921e+06 3151.336341
Ridge_FEATURE1_Train 2274.841596 1.225304e+07 3500.433978
Ridge_FEATURE1_Val 2108.951082 9.600463e+06 3098.461357
Ridge_FEATURE2_Train 2084.101832 1.031979e+07 3212.443623
Ridge_FEATURE2_Val 2102.691899 9.118768e+06 3019.729731
Ridge_FEATURE3_Train 2278.747159 1.223434e+07 3497.762131
Ridge_FEATURE3_Val 2124.893956 9.739202e+06 3120.769408

4. ElasticNet Regression¶

In [472]:
elastic_model
Out[472]:
mae mse rmse
Elastic_Train 2284.880425 1.238910e+07 3519.815887
Elastic_Val 2073.707562 8.579011e+06 2928.994854
Elastic_FEATURE1_Train 2469.341627 1.406453e+07 3750.270443
Elastic_FEATURE1_Val 2185.184604 9.342266e+06 3056.512135
Elastic_FEATURE2_Train 2321.397873 1.289433e+07 3590.867219
Elastic_FEATURE2_Val 2109.088513 8.697749e+06 2949.194714
Elastic_FEATURE3_Train 2279.672369 1.223548e+07 3497.925509
Elastic_FEATURE3_Val 2126.979548 9.753365e+06 3123.037799

5. Decision Tree Regressor¶

In [473]:
dt_model
Out[473]:
mae mse rmse
Dtregressor_Train 0.000000 0.000000e+00 0.000000
Dtregressor_Val 3022.997923 1.981871e+07 4451.821535
Dtregressor_FEATURE1_Train 0.000000 0.000000e+00 0.000000
Dtregressor_FEATURE1_Val 3142.906175 2.105998e+07 4589.114999
Dtregressor_FEATURE2_Train 0.000000 0.000000e+00 0.000000
Dtregressor_FEATURE2_Val 3379.599290 2.756409e+07 5250.151896
Dtregressor_FEATURE3_Train 0.000000 0.000000e+00 0.000000
Dtregressor_FEATURE3_Val 3090.758525 2.071248e+07 4551.096100
DT_Tune_Train 2428.632887 1.302080e+07 3608.435026
DT_Tune_Val 2377.058551 1.166857e+07 3415.929063
DT_Tune_FEATURE1_Train 2634.547381 1.511255e+07 3887.486135
DT_Tune_FEATURE1_Val 2843.669581 1.846317e+07 4296.879298
DT_Tune_FEATURE2_Train 2640.609970 1.566922e+07 3958.435982
DT_Tune_FEATURE2_Val 2532.673206 1.454901e+07 3814.316998
DT_Tune_FEATURE3_Train 2634.547381 1.511255e+07 3887.486135
DT_Tune_FEATURE3_Val 2855.303550 1.849346e+07 4300.402255

6. Random Forest Regressor¶

In [474]:
rf_model
Out[474]:
mae mse rmse
RFregressor_Train 952.337505 2.045942e+06 1430.364188
RFregressor_Val 2203.343703 9.785882e+06 3128.239434
RFregressor_FEATURE1_Train 969.471965 2.116189e+06 1454.712785
RFregressor_FEATURE1_Val 2187.242154 9.871982e+06 3141.970979
RFregressor_FEATURE2_Train 954.917457 2.064596e+06 1436.870097
RFregressor_FEATURE2_Val 2206.446608 9.923375e+06 3150.138845
RFregressor_FEATURE3_Train 969.307015 2.117008e+06 1454.994201
RFregressor_FEATURE3_Val 2190.343906 9.880833e+06 3143.379168
RF_Tune_Train 2163.637111 1.120318e+07 3347.114556
RF_Tune_Val 2368.454600 1.100253e+07 3317.006576
RF_Tune_FEATURE1_Train 2217.155423 1.176883e+07 3430.572444
RF_Tune_FEATURE1_Val 2406.069268 1.134822e+07 3368.712046
RF_Tune_FEATURE2_Train 2162.109950 1.127638e+07 3358.032533
RF_Tune_FEATURE2_Val 2389.063839 1.112963e+07 3336.109354
RF_Tune_FEATURE3_Train 2216.333066 1.176684e+07 3430.283430
RF_Tune_FEATURE3_Val 2403.261234 1.134797e+07 3368.674639

7. Gradient Boosting Regressor¶

In [475]:
xg_model
Out[475]:
mae mse rmse
XGBoost_Train 1062.784899 2.039445e+06 1428.091324
XGBoost_Val 2370.002169 1.150793e+07 3392.334072
XGBoost_FEATURE1_Train 1251.945221 2.949687e+06 1717.465394
XGBoost_FEATURE1_Val 2381.406002 1.206589e+07 3473.598919
XGBoost_FEATURE2_Train 1114.508146 2.326707e+06 1525.354768
XGBoost_FEATURE2_Val 2330.471706 1.189117e+07 3448.357665
XGBoost_FEATURE3_Train 1251.945221 2.949687e+06 1717.465394
XGBoost_FEATURE3_Val 2390.399367 1.221464e+07 3494.944348
XGBoost_Tune_Train 1998.737805 8.201365e+06 2863.802593
XGBosst_Tune_Val 2206.331138 8.736292e+06 2955.721868
XGBoost_Tune_FEATURE1_Train 1987.072765 8.215540e+06 2866.276384
XGBosst_Tune_FEATURE1_Val 2301.942297 1.115216e+07 3339.484760
XGBoost_Tune_FEATURE2_Train 1940.521103 7.589272e+06 2754.863299
XGBosst_Tune_FEATURE2_Val 2276.243249 1.131108e+07 3363.195538
XGBoost_Tune_FEATURE3_Train 1987.072765 8.215540e+06 2866.276384
XGBosst_Tune_FEATURE3_Val 2306.776818 1.116378e+07 3341.225026
XGBoost_Random_Training 1407.143534 4.645281e+06 2155.291443
XGBoost_Random_Validating 2326.201974 1.088273e+07 3298.897608

The best model is XGboost_Tune:

In [476]:
xg_tuning
Out[476]:
mae mse rmse
XGBoost_Tune_Train 1998.737805 8.201365e+06 2863.802593
XGBosst_Tune_Val 2206.331138 8.736292e+06 2955.721868

Evaluation¶

The best performance model of all is Gradient Bossting (with hyper parameter tuning)¶

In [477]:
xg_regressor = GradientBoostingRegressor(random_state=42, learning_rate=0.3, n_estimators=10, max_depth=3, min_samples_split=5, min_samples_leaf=1, max_features=10).fit(X_train, y_train)
In [478]:
y_preds_train = xg_regressor.predict(X_train)
y_preds_val = xg_regressor.predict(X_val)
y_preds_test = xg_regressor.predict(X_test)

mse = mean_squared_error(y_preds_train, y_train)
mae = mean_absolute_error(y_preds_train, y_train)
rmse = mean_squared_error(y_preds_train, y_train, squared=False)

xgboost_train = pd.DataFrame({'mae': mae,
                              'mse': mse,
                              'rmse': rmse},
                             index=['XGBoost_Training'])

mse = mean_squared_error(y_preds_val, y_val)
mae = mean_absolute_error(y_preds_val, y_val)
rmse = mean_squared_error(y_preds_val, y_val, squared=False)

xgboost_val = pd.DataFrame({'mae': mae,
                             'mse': mse,
                             'rmse': rmse},
                            index=['XGBoost_Validating'])

mse = mean_squared_error(y_preds_test, y_test)
mae = mean_absolute_error(y_preds_test, y_test)
rmse = mean_squared_error(y_preds_test, y_test, squared=False)

xgboost_test = pd.DataFrame({'mae': mae,
                             'mse': mse,
                             'rmse': rmse},
                             index=['XGBoost_Testing'])

xgboost_models = pd.concat([xgboost_train, xgboost_val, xgboost_test])
xgboost_models
Out[478]:
mae mse rmse
XGBoost_Training 1998.737805 8.201365e+06 2863.802593
XGBoost_Validating 2206.331138 8.736292e+06 2955.721868
XGBoost_Testing 2589.837132 1.261266e+07 3551.430541
In [479]:
fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(20, 6))


axes[0].plot(y_train, y_preds_train, 'o',
             color='orange', label='Predictions')
axes[0].plot(y_train, y_train, '-', color='red', label='Actual')

axes[0].set_xlabel('Actual')
axes[0].set_ylabel('Predicted')
axes[0].set_title('XGBoost_Train: Comparison of Actual vs. Predicted Target')

axes[0].legend()

axes[1].plot(y_val, y_preds_val, 'o',
             color='orange', label='Predictions')
axes[1].plot(y_val, y_val, '-', color='red', label='Actual')

axes[1].set_xlabel('Actual')
axes[1].set_ylabel('Predicted')
axes[1].set_title('XGBoost_Val: Comparison of Actual vs. Predicted Target')

axes[1].legend()

axes[2].plot(y_test, y_preds_test, 'o',
             color='orange', label='Predictions')
axes[2].plot(y_test, y_test, '-', color='red', label='Actual')

axes[2].set_xlabel('Actual')
axes[2].set_ylabel('Predicted')
axes[2].set_title('XGBoost_Test: Comparison of Actual vs. Predicted Target')

axes[2].legend()

plt.show()

The evaluation was performed using the test sets to assess the models' performance of how well the models generalised to unseen data.

Among the models evaluated, the Gradient Boosting Regressor demonstrated the best performance after hyperparameter tuning. It achieved an MAE of 1998.74, 2206.3, and 2589.84 and an RMSE of 2863.8, 2955.72, and 3551.43 on the training, validation, and testing sets, respectively.

Overall, the evaluation confirmed that the Gradient Boosting Regressor effectively predicted the next month's spending, with low MAE and RMSE values and a strong alignment between predicted and actual values in the scatter plot.